Performance aware convolutional neural network channel pruning for embedded GPUs

V Radu, K Kaszyk, Y Wen, J Turner… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Convolutional Neural Networks (CNN) are becoming a common presence in many
applications and services, due to their superior recognition accuracy. They are increasingly …

DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration

P Gibson, J Cano, E Crowley, A Storkey… - ACM Transactions on …, 2024 - dl.acm.org
Deep Neural Networks (DNNs) are very computationally demanding, which presents a
significant barrier to their deployment, especially on resource-constrained devices …

DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack

P Gibson, J Cano, EJ Crowley, A Storkey… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a
large barrier to their deployment on resource-constrained devices. Since such devices are …

Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm

R Stewart, A Nowlan, P Bacchus, Q Ducasse… - Electronics, 2021 - mdpi.com
This paper compares the latency, accuracy, training time and hardware costs of neural
networks compressed with our new multi-objective evolutionary algorithm called NEMOKD …

Neural architecture search as program transformation exploration

J Turner, EJ Crowley, MFP O'Boyle - Proceedings of the 26th ACM …, 2021 - dl.acm.org
Improving the performance of deep neural networks (DNNs) is important to both the compiler
and neural architecture search (NAS) communities. Compilers apply program …

Compiler-centric across-stack deep learning acceleration

P Gibson - 2023 - theses.gla.ac.uk
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning
approaches increasingly providing state-of-the-art solutions to a variety of difficult problems …

[图书][B] Latency-aware structured pruning of pretrained transformer-based models

A Hoffman - 2022 - search.proquest.com
The use of BERT-based Natural Language Processing models has rapidly grown in recent
years, yet they remain difficult to deploy on edge devices where memory and compute …

[PDF][PDF] Deep learning on a low power gpu

P Gibson - University of Edinburgh, Project Archive, 2019 - project-archive.inf.ed.ac.uk
This report details the design, implementation, and evaulation of “Orpheus”, a tool to
benchmark the inference of deep learning systems on heterogeneous devices, and enable …

Simulation methodologies for mobile GPUs

K Kaszyk - 2022 - era.ed.ac.uk
GPUs critically rely on a complex system software stack comprising kernel-and user-space
drivers and JIT compilers. Yet, existing GPU simulators typically abstract away details of the …

[PDF][PDF] Finding the Right Teacher for a Difficult Student

D Whettam - 2020 - dwhettam.github.io
Finding the Right Teacher for a Difficult Student Page 1 Finding the Right Teacher for a
Difficult Student Daniel Whettam The University of Edinburgh / The University of Bristol March …