An LPDDR-based CXL-PNM Platform for TCO-efficient Inference of Transformer-based Large Language...

C Guo, F Cheng, Z Du, J Kiessling, J Ku, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

J Cho, M Kim, H Choi, G Heo, J Park - arXiv preprint arXiv:2408.05499, 2024 - arxiv.org

Recently, there has been an extensive research effort in building efficient large language
model (LLM) inference serving systems. These efforts not only include innovations in the …

被引用次数：1 相关文章所有 3 个版本

A 3D-stack DRAM-based PNM architecture design

Q Zhou, B Wang, XT Xiao - Integration, 2025 - Elsevier

The article examines methods for integrating 3D-stacked DRAM with AI logic chips, in order
to overcome the memory bandwidth challenges faced in the AI inference of transformer …

[PDF] openreview.net

LLM-Sim: A Simulation Infrastructure for LLM Inference Serving Systems

J Cho, M Kim, H Choi, J Park - Machine Learning for Computer … - openreview.net

Recently, there has been a large research effort in building efficient large language model
(LLM) inference serving systems, including advancements in both hardware and software …

[PDF] github.io

[PDF][PDF] LLMServingSim: A Simulation Infrastructure for LLM Inference Serving Systems

J Cho, M Kim, H Choi, J Park - jongse-park.github.io

Recently, there has been a large research effort in building efficient large language model
(LLM) inference serving systems, including advancements in both hardware and software …