FPGA-based accelerators of deep learning networks for learning and classification: A review
Due to recent advances in digital technologies, and availability of credible data, an area of
artificial intelligence, deep learning, has emerged and has demonstrated its ability and …
artificial intelligence, deep learning, has emerged and has demonstrated its ability and …
The future of FPGA acceleration in datacenters and the cloud
In this article, we survey existing academic and commercial efforts to provide Field-
Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a …
Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a …
An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems
Cloud services have recently started undergoing a major shift from monolithic applications,
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …
A configurable cloud-scale DNN processor for real-time AI
J Fowers, K Ovtcharov, M Papamichael… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Interactive AI-powered services require low-latency evaluation of deep neural network
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …
Azure accelerated networking:{SmartNICs} in the public cloud
Modern cloud architectures rely on each server running its own networking stack to
implement policies such as tunneling for virtual networks, security, and load balancing …
implement policies such as tunneling for virtual networks, security, and load balancing …
ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars
A Shafiee, A Nag, N Muralimanohar… - ACM SIGARCH …, 2016 - dl.acm.org
A number of recent efforts have attempted to design accelerators for popular machine
learning algorithms, such as those involving convolutional and deep neural networks (CNNs …
learning algorithms, such as those involving convolutional and deep neural networks (CNNs …
{LegoOS}: A disseminated, distributed {OS} for hardware resource disaggregation
The monolithic server model where a server is the unit of deployment, operation, and failure
is meeting its limits in the face of several recent hardware and application trends. To improve …
is meeting its limits in the face of several recent hardware and application trends. To improve …
Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …
analyze massive amounts of audio, image, video, text and graph data, with applications in …
VTR 8: High-performance CAD and customizable FPGA architecture modelling
Developing Field-programmable Gate Array (FPGA) architectures is challenging due to the
competing requirements of various application domains and changing manufacturing …
competing requirements of various application domains and changing manufacturing …
Nvidia tensor core programmability, performance & precision
The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called Tensor Core
that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The …
that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The …