Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

Welder: Scheduling deep learning memory access via tile-graph

Y Shi, Z Yang, J Xue, L Ma, Y Xia, Z Miao… - … USENIX Symposium on …, 2023 - usenix.org
With the growing demand for processing higher fidelity data and the use of faster computing
cores in newer hardware accelerators, modern deep neural networks (DNNs) are becoming …

Whale: Efficient giant model training over heterogeneous {GPUs}

X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang… - 2022 USENIX Annual …, 2022 - usenix.org
The scaling up of deep neural networks has been demonstrated to be effective in improving
model quality, but also encompasses several training challenges in terms of training …

Aspen: Breaking operator barriers for efficient parallelization of deep neural networks

J Park, K Bin, G Park, S Ha… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Modern Deep Neural Network (DNN) frameworks use tensor operators as the main
building blocks of DNNs. However, we observe that operator-based construction of DNNs …

Drew: Efficient winograd cnn inference with deep reuse

R Wu, F Zhang, J Guan, Z Zheng, X Du… - Proceedings of the ACM …, 2022 - dl.acm.org
Deep learning has been used in various domains, including Web services. Convolutional
neural networks (CNNs), which are deep learning representatives, are among the most …

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org
Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Dynamic shape computations have become critical in modern machine learning workloads,
especially in emerging large language models. The success of these models has driven …

Tcb: Accelerating transformer inference services with request concatenation

B Fu, F Chen, P Li, D Zeng - … of the 51st International Conference on …, 2022 - dl.acm.org
Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …