Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture

L Lu, Y Jin, H Bi, Z Luo, P Li, T Wang… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
In recent years, attention-based models have achieved impressive performance in natural
language processing and computer vision applications by effectively capturing contextual …

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu… - Proceedings of the 49th …, 2022 - dl.acm.org
Hardware specialization is a promising trend to sustain performance growth. Spatial
hardware accelerators that employ specialized and hierarchical computation and memory …

Tenet: A framework for modeling tensor dataflow based on relation-centric notation

L Lu, N Guan, Y Wang, L Jia, Z Luo… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Accelerating tensor applications on spatial architectures provides high performance and
energy-efficiency, but requires accurate performance models for evaluating various dataflow …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Inter-layer scheduling space definition and exploration for tiled accelerators

J Cai, Y Wei, Z Wu, S Peng, K Ma - Proceedings of the 50th Annual …, 2023 - dl.acm.org
With the continuous expansion of the DNN accelerator scale, inter-layer scheduling, which
studies the allocation of computing resources to each layer and the computing order of all …

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

Dosa: Differentiable model-based one-loop search for dnn accelerators

C Hong, Q Huang, G Dinh, M Subedar… - Proceedings of the 56th …, 2023 - dl.acm.org
In the hardware design space exploration process, it is critical to optimize both hardware
parameters and algorithm-to-hardware mappings. Previous work has largely approached …

Large circuit models: opportunities and challenges

L Chen, Y Chen, Z Chu, W Fang, TY Ho… - Science China …, 2024 - Springer
Within the electronic design automation (EDA) domain, artificial intelligence (AI)-driven
solutions have emerged as formidable tools, yet they typically augment rather than redefine …

High-level synthesis hardware design for fpga-based accelerators: Models, methodologies, and frameworks

RS Molina, V Gil-Costa, ML Crespo, G Ramponi - IEEE Access, 2022 - ieeexplore.ieee.org
Hardware accelerators based on field programmable gate array (FPGA) and system on chip
(SoC) devices have gained attention in recent years. One of the main reasons is that these …

Telamalloc: Efficient on-chip memory allocation for production machine learning accelerators

M Maas, U Beaugnon, A Chauhan, B Ilbeyi - Proceedings of the 28th …, 2022 - dl.acm.org
Memory buffer allocation for on-chip memories is a major challenge in modern machine
learning systems that target ML accelerators. In interactive systems such as mobile phones …