A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

J Cho, M Kim, H Choi, G Heo, J Park - arXiv preprint arXiv:2408.05499, 2024 - arxiv.org
Recently, there has been an extensive research effort in building efficient large language
model (LLM) inference serving systems. These efforts not only include innovations in the …

A 3D-stack DRAM-based PNM architecture design

Q Zhou, B Wang, XT Xiao - Integration, 2025 - Elsevier
The article examines methods for integrating 3D-stacked DRAM with AI logic chips, in order
to overcome the memory bandwidth challenges faced in the AI inference of transformer …

LLM-Sim: A Simulation Infrastructure for LLM Inference Serving Systems

J Cho, M Kim, H Choi, J Park - Machine Learning for Computer … - openreview.net
Recently, there has been a large research effort in building efficient large language model
(LLM) inference serving systems, including advancements in both hardware and software …

[PDF][PDF] LLMServingSim: A Simulation Infrastructure for LLM Inference Serving Systems

J Cho, M Kim, H Choi, J Park - jongse-park.github.io
Recently, there has been a large research effort in building efficient large language model
(LLM) inference serving systems, including advancements in both hardware and software …