Performance aware convolutional neural network channel pruning for embedded GPUs
Convolutional Neural Networks (CNN) are becoming a common presence in many
applications and services, due to their superior recognition accuracy. They are increasingly …
applications and services, due to their superior recognition accuracy. They are increasingly …
DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration
Deep Neural Networks (DNNs) are very computationally demanding, which presents a
significant barrier to their deployment, especially on resource-constrained devices …
significant barrier to their deployment, especially on resource-constrained devices …
DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a
large barrier to their deployment on resource-constrained devices. Since such devices are …
large barrier to their deployment on resource-constrained devices. Since such devices are …
Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm
This paper compares the latency, accuracy, training time and hardware costs of neural
networks compressed with our new multi-objective evolutionary algorithm called NEMOKD …
networks compressed with our new multi-objective evolutionary algorithm called NEMOKD …
Neural architecture search as program transformation exploration
Improving the performance of deep neural networks (DNNs) is important to both the compiler
and neural architecture search (NAS) communities. Compilers apply program …
and neural architecture search (NAS) communities. Compilers apply program …
Compiler-centric across-stack deep learning acceleration
P Gibson - 2023 - theses.gla.ac.uk
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning
approaches increasingly providing state-of-the-art solutions to a variety of difficult problems …
approaches increasingly providing state-of-the-art solutions to a variety of difficult problems …
[图书][B] Latency-aware structured pruning of pretrained transformer-based models
A Hoffman - 2022 - search.proquest.com
The use of BERT-based Natural Language Processing models has rapidly grown in recent
years, yet they remain difficult to deploy on edge devices where memory and compute …
years, yet they remain difficult to deploy on edge devices where memory and compute …
[PDF][PDF] Deep learning on a low power gpu
P Gibson - University of Edinburgh, Project Archive, 2019 - project-archive.inf.ed.ac.uk
This report details the design, implementation, and evaulation of “Orpheus”, a tool to
benchmark the inference of deep learning systems on heterogeneous devices, and enable …
benchmark the inference of deep learning systems on heterogeneous devices, and enable …
Simulation methodologies for mobile GPUs
K Kaszyk - 2022 - era.ed.ac.uk
GPUs critically rely on a complex system software stack comprising kernel-and user-space
drivers and JIT compilers. Yet, existing GPU simulators typically abstract away details of the …
drivers and JIT compilers. Yet, existing GPU simulators typically abstract away details of the …
[PDF][PDF] Finding the Right Teacher for a Difficult Student
D Whettam - 2020 - dwhettam.github.io
Finding the Right Teacher for a Difficult Student Page 1 Finding the Right Teacher for a
Difficult Student Daniel Whettam The University of Edinburgh / The University of Bristol March …
Difficult Student Daniel Whettam The University of Edinburgh / The University of Bristol March …