[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

A comprehensive survey on trustworthy recommender systems

W Fan, X Zhao, X Chen, J Su, J Gao, L Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
As one of the most successful AI-powered applications, recommender systems aim to help
people make appropriate decisions in an effective and efficient way, by providing …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

The architectural implications of facebook's dnn-based personalized recommendation

U Gupta, CJ Wu, X Wang, M Naumov… - … Symposium on High …, 2020 - ieeexplore.ieee.org
The widespread application of deep learning has changed the landscape of computation in
data centers. In particular, personalized recommendation for content ranking is now largely …

Recnmp: Accelerating personalized recommendation with near-memory processing

L Ke, U Gupta, BY Cho, D Brooks… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …

Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

RecSSD: near data processing for solid state drive based recommendation inference

M Wilkening, U Gupta, S Hsia, C Trippel… - Proceedings of the 26th …, 2021 - dl.acm.org
Neural personalized recommendation models are used across a wide variety of datacenter
applications including search, social media, and entertainment. State-of-the-art models …

Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

Near-memory processing in action: Accelerating personalized recommendation with axdimm

L Ke, X Zhang, J So, JG Lee, SH Kang, S Lee… - IEEE Micro, 2021 - ieeexplore.ieee.org
Near-memory processing (NMP) is a prospective paradigm enabling memory-centric
computing. By moving the compute capability next to the main memory (DRAM modules), it …