Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture
In recent years, attention-based models have achieved impressive performance in natural
language processing and computer vision applications by effectively capturing contextual …
language processing and computer vision applications by effectively capturing contextual …
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
Hardware specialization is a promising trend to sustain performance growth. Spatial
hardware accelerators that employ specialized and hierarchical computation and memory …
hardware accelerators that employ specialized and hierarchical computation and memory …
Tenet: A framework for modeling tensor dataflow based on relation-centric notation
Accelerating tensor applications on spatial architectures provides high performance and
energy-efficiency, but requires accurate performance models for evaluating various dataflow …
energy-efficiency, but requires accurate performance models for evaluating various dataflow …
Tileflow: A framework for modeling fusion dataflow via tree-based analysis
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
Inter-layer scheduling space definition and exploration for tiled accelerators
With the continuous expansion of the DNN accelerator scale, inter-layer scheduling, which
studies the allocation of computing resources to each layer and the computing order of all …
studies the allocation of computing resources to each layer and the computing order of all …
Chimera: An analytical optimizing framework for effective compute-intensive operators fusion
Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …
years. There are two types of operators in machine learning: compute-intensive operators …
Dosa: Differentiable model-based one-loop search for dnn accelerators
In the hardware design space exploration process, it is critical to optimize both hardware
parameters and algorithm-to-hardware mappings. Previous work has largely approached …
parameters and algorithm-to-hardware mappings. Previous work has largely approached …
Large circuit models: opportunities and challenges
Within the electronic design automation (EDA) domain, artificial intelligence (AI)-driven
solutions have emerged as formidable tools, yet they typically augment rather than redefine …
solutions have emerged as formidable tools, yet they typically augment rather than redefine …
High-level synthesis hardware design for fpga-based accelerators: Models, methodologies, and frameworks
Hardware accelerators based on field programmable gate array (FPGA) and system on chip
(SoC) devices have gained attention in recent years. One of the main reasons is that these …
(SoC) devices have gained attention in recent years. One of the main reasons is that these …
Telamalloc: Efficient on-chip memory allocation for production machine learning accelerators
Memory buffer allocation for on-chip memories is a major challenge in modern machine
learning systems that target ML accelerators. In interactive systems such as mobile phones …
learning systems that target ML accelerators. In interactive systems such as mobile phones …