AI and memory wall

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

被引用次数：156 相关文章所有 4 个版本

Two-Dimensional Materials for Brain-Inspired Computing Hardware

S Hadke, MA Kang, VK Sangwan… - Chemical Reviews, 2025 - ACS Publications

Recent breakthroughs in brain-inspired computing promise to address a wide range of
problems from security to healthcare. However, the current strategy of implementing artificial …

Billm: Pushing the limit of post-training quantization for llms

W Huang, Y Liu, H Qin, Y Li, S Zhang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Pretrained large language models (LLMs) exhibit exceptional general language processing
capabilities but come with significant demands on memory and computational resources. As …

被引用次数：61 相关文章所有 4 个版本

Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applications

I Hwang, J Lee, H Kang, G Lee, H Kim - Simulation Modelling Practice and …, 2024 - Elsevier

In computer architecture studies, simulators are crucial for design verification, reducing
research and development time and ensuring the high accuracy of verification results …

被引用次数：1 相关文章

[PDF] arxiv.org

Fast matrix multiplications for lookup table-quantized llms

H Guo, W Brandon, R Cholakov… - arXiv preprint arXiv …, 2024 - arxiv.org

The deployment of large language models (LLMs) is often constrained by memory
bandwidth, where the primary bottleneck is the cost of transferring model parameters from …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

W An, X Bi, G Chen, S Chen, C Deng… - … Conference for High …, 2024 - ieeexplore.ieee.org

The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has
exponentially increased demands of computational power and bandwidth. This, combined …

被引用次数：1 相关文章所有 5 个版本

[PDF] acm.org

MNEMOSENE: Tile Architecture and Simulator for Memristor-based Computation-in-memory

M Zahedi, MA Lebdeh, C Bengel, D Wouters… - ACM Journal on …, 2022 - dl.acm.org

In recent years, we are witnessing a trend toward in-memory computing for future
generations of computers that differs from traditional von-Neumann architecture in which …

被引用次数：21 相关文章所有 8 个版本

[PDF] arxiv.org

Recent and upcoming developments in randomized numerical linear algebra for machine learning

M Dereziński, MW Mahoney - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org

Large matrices arise in many machine learning and data analysis applications, including as
representations of datasets, graphs, model weights, and first and second-order derivatives …

被引用次数：2 相关文章

[PDF] arxiv.org

BBS: Bi-directional bit-level sparsity for deep learning acceleration

Y Chen, J Meng, J Seo… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable
within bit-serial deep learning accelerators. This type of sparsity at the bit-level is especially …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

H You, Y Guo, Y Fu, W Zhou, H Shi, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have shown impressive performance on language tasks but
face challenges when deployed on resource-constrained devices due to their extensive …

被引用次数：3 相关文章所有 2 个版本

Squeezellm: Dense-and-sparse quantization

Two-Dimensional Materials for Brain-Inspired Computing Hardware

Billm: Pushing the limit of post-training quantization for llms

Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applications

Fast matrix multiplications for lookup table-quantized llms

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

MNEMOSENE: Tile Architecture and Simulator for Memristor-based Computation-in-memory

Recent and upcoming developments in randomized numerical linear algebra for machine learning

BBS: Bi-directional bit-level sparsity for deep learning acceleration

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

高级搜索

引用