Elastic bulk synchronous parallel model for distributed deep learning

High-level synthesis hardware design for fpga-based accelerators: Models, methodologies, and frameworks

RS Molina, V Gil-Costa, ML Crespo, G Ramponi - IEEE Access, 2022 - ieeexplore.ieee.org

Hardware accelerators based on field programmable gate array (FPGA) and system on chip
(SoC) devices have gained attention in recent years. One of the main reasons is that these …

被引用次数：26 相关文章所有 8 个版本

[PDF] arxiv.org

Distributed and deep vertical federated learning with big data

J Liu, X Zhou, L Mo, S Ji, Y Liao, Z Li… - Concurrency and …, 2023 - Wiley Online Library

In recent years, data are typically distributed in multiple organizations while the data security
is becoming increasingly important. Federated learning (FL), which enables multiple parties …

被引用次数：10 相关文章所有 6 个版本

DLB: a dynamic load balance strategy for distributed training of deep neural networks

Q Ye, Y Zhou, M Shi, Y Sun, J Lv - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Synchronous strategies with data parallelism are widely utilized in distributed training of
Deep Neural Networks (DNNs), largely owing to their easy implementation yet promising …

被引用次数：4 相关文章

[PDF] acm.org

Aladdin: Asymmetric centralized training for distributed deep learning

Y Ko, K Choi, H Jei, D Lee, SW Kim - Proceedings of the 30th ACM …, 2021 - dl.acm.org

To speed up the training of massive deep neural network (DNN) models, distributed training
has been widely studied. In general, a centralized training, a type of distributed training …

被引用次数：7 相关文章所有 5 个版本

[PDF] mlsys.org

Virtualflow: Decoupling deep learning models from the underlying hardware

A Or, H Zhang, MN Freedman - Proceedings of Machine …, 2022 - proceedings.mlsys.org

We propose VirtualFlow, a system leveraging a novel abstraction called virtual node
processing to decouple the model from the hardware. In each step of training or inference …

被引用次数：10 相关文章所有 4 个版本

FLSGD: free local SGD with parallel synchronization

Q Ye, Y Zhou, M Shi, J Lv - The Journal of Supercomputing, 2022 - Springer

Synchronous parameters algorithms with data parallelism have been successfully utilized to
accelerate the distributed training of deep neural networks (DNNs). However, a prevalent …

被引用次数：6 相关文章所有 3 个版本

[PDF] mdpi.com

`SHAT`: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning

Y Ko, SW Kim - Applied Sciences, 2021 - mdpi.com

The recent unprecedented success of deep learning (DL) in various fields is underlied by its
use of large-scale data and models. Training a large-scale deep neural network (DNN) …

被引用次数：4 相关文章所有 8 个版本

被引用次数：4 相关文章所有 9 个版本