AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training...

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

被引用次数：12 相关文章所有 6 个版本

[PDF] acm.org

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org

With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

被引用次数：11 相关文章所有 5 个版本

[PDF] google.com

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

被引用次数：18 相关文章所有 4 个版本

[PDF] usenix.org

Welder: Scheduling deep learning memory access via tile-graph

Y Shi, Z Yang, J Xue, L Ma, Y Xia, Z Miao… - … USENIX Symposium on …, 2023 - usenix.org

With the growing demand for processing higher fidelity data and the use of faster computing
cores in newer hardware accelerators, modern deep neural networks (DNNs) are becoming …

被引用次数：17 相关文章所有 2 个版本

[PDF] usenix.org

Whale: Efficient giant model training over heterogeneous {GPUs}

X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang… - 2022 USENIX Annual …, 2022 - usenix.org

The scaling up of deep neural networks has been demonstrated to be effective in improving
model quality, but also encompasses several training challenges in terms of training …

被引用次数：40 相关文章所有 6 个版本

[PDF] acm.org

A tensor compiler with automatic data packing for simple and efficient fully homomorphic encryption

A Krastev, N Samardzic, S Langowski… - Proceedings of the …, 2024 - dl.acm.org

Fully Homomorphic Encryption (FHE) enables computing on encrypted data, letting clients
securely offload computation to untrusted servers. While enticing, FHE has two key …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

Aspen: Breaking operator barriers for efficient parallelization of deep neural networks

J Park, K Bin, G Park, S Ha… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Modern Deep Neural Network (DNN) frameworks use tensor operators as the main
building blocks of DNNs. However, we observe that operator-based construction of DNNs …

被引用次数：1 相关文章所有 2 个版本

[PDF] github.io

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org

Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

被引用次数：7 相关文章所有 4 个版本

[PDF] ncsu.edu

Drew: Efficient winograd cnn inference with deep reuse

R Wu, F Zhang, J Guan, Z Zheng, X Du… - Proceedings of the ACM …, 2022 - dl.acm.org

Deep learning has been used in various domains, including Web services. Convolutional
neural networks (CNNs), which are deep learning representatives, are among the most …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Dynamic shape computations have become critical in modern machine learning workloads,
especially in emerging large language models. The success of these models has driven …

被引用次数：7 相关文章所有 2 个版本