[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
A comprehensive survey on trustworthy recommender systems
As one of the most successful AI-powered applications, recommender systems aim to help
people make appropriate decisions in an effective and efficient way, by providing …
people make appropriate decisions in an effective and efficient way, by providing …
Spatten: Efficient sparse attention architecture with cascade token and head pruning
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …
(NLP) applications, showing superior performance than convolutional and recurrent …
Splitwise: Efficient generative llm inference using phase splitting
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
The architectural implications of facebook's dnn-based personalized recommendation
The widespread application of deep learning has changed the landscape of computation in
data centers. In particular, personalized recommendation for content ranking is now largely …
data centers. In particular, personalized recommendation for content ranking is now largely …
Recnmp: Accelerating personalized recommendation with near-memory processing
Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …
Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …
services and products, constituting significant compute demand of cloud infrastructure. Thus …
RecSSD: near data processing for solid state drive based recommendation inference
Neural personalized recommendation models are used across a wide variety of datacenter
applications including search, social media, and entertainment. State-of-the-art models …
applications including search, social media, and entertainment. State-of-the-art models …
Understanding training efficiency of deep learning recommendation models at scale
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
Near-memory processing in action: Accelerating personalized recommendation with axdimm
Near-memory processing (NMP) is a prospective paradigm enabling memory-centric
computing. By moving the compute capability next to the main memory (DRAM modules), it …
computing. By moving the compute capability next to the main memory (DRAM modules), it …