Fiber laser development enabled by machine learning: review and prospect

M Jiang, H Wu, Y An, T Hou, Q Chang, L Huang, J Li… - PhotoniX, 2022 - Springer
In recent years, machine learning, especially various deep neural networks, as an emerging
technique for data analysis and processing, has brought novel insights into the development …

From turing to transformers: A comprehensive review and tutorial on the evolution and applications of generative transformer models

EY Zhang, AD Cheok, Z Pan, J Cai, Y Yan - Sci, 2023 - mdpi.com
In recent years, generative transformers have become increasingly prevalent in the field of
artificial intelligence, especially within the scope of natural language processing. This paper …

A novel time–frequency Transformer based on self–attention mechanism and its application in fault diagnosis of rolling bearings

Y Ding, M Jia, Q Miao, Y Cao - Mechanical Systems and Signal Processing, 2022 - Elsevier
The scope of data-driven fault diagnosis models is greatly extended through deep learning
(DL). However, the classical convolution and recurrent structure have their defects in …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Activated gradients for deep neural networks

M Liu, L Chen, X Du, L Jin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks often suffer from poor performance or even training failure due to the
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …

Deep transfer learning for land use and land cover classification: A comparative study

R Naushad, T Kaur, E Ghaderpour - Sensors, 2021 - mdpi.com
Efficiently implementing remote sensing image classification with high spatial resolution
imagery can provide significant value in land use and land cover (LULC) classification. The …

Mime: Mimicking centralized stochastic algorithms in federated learning

SP Karimireddy, M Jaggi, S Kale, M Mohri… - arXiv preprint arXiv …, 2020 - arxiv.org
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

Just pick a sign: Optimizing deep multitask models with gradient sign dropout

Z Chen, J Ngiam, Y Huang, T Luong… - Advances in …, 2020 - proceedings.neurips.cc
The vast majority of deep models use multiple gradient signals, typically corresponding to a
sum of multiple loss terms, to update a shared set of trainable weights. However, these …

Understanding gradient clipping in private sgd: A geometric perspective

X Chen, SZ Wu, M Hong - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Deep learning models are increasingly popular in many machine learning applications
where the training data may contain sensitive information. To provide formal and rigorous …

Tempered sigmoid activations for deep learning with differential privacy

N Papernot, A Thakurta, S Song, S Chien… - Proceedings of the …, 2021 - ojs.aaai.org
Because learning sometimes involves sensitive data, machine learning algorithms have
been extended to offer differential privacy for training data. In practice, this has been mostly …