Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations...

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier

In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

被引用次数：68 相关文章所有 7 个版本

[PDF] arxiv.org

A comprehensive survey on trustworthy recommender systems

W Fan, X Zhao, X Chen, J Su, J Gao, L Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

As one of the most successful AI-powered applications, recommender systems aim to help
people make appropriate decisions in an effective and efficient way, by providing …

被引用次数：38 相关文章所有 3 个版本

[PDF] arxiv.org

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org

The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

被引用次数：305 相关文章所有 6 个版本

[PDF] washington.edu

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

被引用次数：48 相关文章所有 2 个版本

[PDF] epfl.ch

The architectural implications of facebook's dnn-based personalized recommendation

U Gupta, CJ Wu, X Wang, M Naumov… - … Symposium on High …, 2020 - ieeexplore.ieee.org

The widespread application of deep learning has changed the landscape of computation in
data centers. In particular, personalized recommendation for content ranking is now largely …

被引用次数：309 相关文章所有 10 个版本

[PDF] arxiv.org

Recnmp: Accelerating personalized recommendation with near-memory processing

L Ke, U Gupta, BY Cho, D Brooks… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …

被引用次数：215 相关文章所有 11 个版本

[PDF] arxiv.org

Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

被引用次数：192 相关文章所有 12 个版本

[PDF] arxiv.org

RecSSD: near data processing for solid state drive based recommendation inference

M Wilkening, U Gupta, S Hsia, C Trippel… - Proceedings of the 26th …, 2021 - dl.acm.org

Neural personalized recommendation models are used across a wide variety of datacenter
applications including search, social media, and entertainment. State-of-the-art models …

被引用次数：101 相关文章所有 6 个版本

[PDF] arxiv.org

Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org

The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

被引用次数：106 相关文章所有 5 个版本

Near-memory processing in action: Accelerating personalized recommendation with axdimm

L Ke, X Zhang, J So, JG Lee, SH Kang, S Lee… - IEEE Micro, 2021 - ieeexplore.ieee.org

Near-memory processing (NMP) is a prospective paradigm enabling memory-centric
computing. By moving the compute capability next to the main memory (DRAM modules), it …

被引用次数：83 相关文章所有 4 个版本