mixup: Beyond empirical risk minimization H Zhang, M Cisse, YN Dauphin, D Lopez-Paz arXiv preprint arXiv:1710.09412, 2017 | 10460 | 2017 |
Convolutional sequence to sequence learning J Gehring, M Auli, D Grangier, D Yarats, YN Dauphin International conference on machine learning, 1243-1252, 2017 | 4139 | 2017 |
Language modeling with gated convolutional networks YN Dauphin, A Fan, M Auli, D Grangier International conference on machine learning, 933-941, 2017 | 2725 | 2017 |
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization YN Dauphin, R Pascanu, C Gulcehre, K Cho, S Ganguli, Y Bengio Advances in neural information processing systems 27, 2014 | 1792 | 2014 |
Hierarchical neural story generation A Fan, M Lewis, Y Dauphin arXiv preprint arXiv:1805.04833, 2018 | 1579 | 2018 |
Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv e-prints, arXiv: 1605.02688, 2016 | 918 | 2016 |
Parseval networks: Improving robustness to adversarial examples M Cisse, P Bojanowski, E Grave, Y Dauphin, N Usunier International conference on machine learning, 854-863, 2017 | 891 | 2017 |
Using recurrent neural networks for slot filling in spoken language understanding G Mesnil, Y Dauphin, K Yao, Y Bengio, L Deng, D Hakkani-Tur, X He, ... IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (3), 530-539, 2014 | 762 | 2014 |
Equilibrated adaptive learning rates for non-convex optimization Y Dauphin, H De Vries, Y Bengio Advances in neural information processing systems 28, 2015 | 669 | 2015 |
Pay less attention with lightweight and dynamic convolutions F Wu, A Fan, A Baevski, YN Dauphin, M Auli arXiv preprint arXiv:1901.10430, 2019 | 661 | 2019 |
A convolutional encoder model for neural machine translation J Gehring, M Auli, D Grangier, YN Dauphin arXiv preprint arXiv:1611.02344, 2016 | 588 | 2016 |
Emonets: Multimodal deep learning approaches for emotion recognition in video SE Kahou, X Bouthillier, P Lamblin, C Gulcehre, V Michalski, K Konda, ... Journal on Multimodal User Interfaces 10, 99-111, 2016 | 499 | 2016 |
Deal or no deal? end-to-end learning for negotiation dialogues M Lewis, D Yarats, YN Dauphin, D Parikh, D Batra arXiv preprint arXiv:1706.05125, 2017 | 464 | 2017 |
Better mixing via deep representations Y Bengio, G Mesnil, Y Dauphin, S Rifai International conference on machine learning, 552-560, 2013 | 438 | 2013 |
Combining modality specific deep neural networks for emotion recognition in video SE Kahou, C Pal, X Bouthillier, P Froumenty, Ç Gülçehre, R Memisevic, ... Proceedings of the 15th ACM on International conference on multimodal …, 2013 | 434 | 2013 |
Empirical analysis of the hessian of over-parametrized neural networks L Sagun, U Evci, VU Guney, Y Dauphin, L Bottou arXiv preprint arXiv:1706.04454, 2017 | 379 | 2017 |
Fixup initialization: Residual learning without normalization H Zhang, YN Dauphin, T Ma arXiv preprint arXiv:1901.09321, 2019 | 371 | 2019 |
The manifold tangent classifier S Rifai, YN Dauphin, P Vincent, Y Bengio, X Muller Advances in neural information processing systems 24, 2011 | 349 | 2011 |
Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017 H Zhang, M Cisse, YN Dauphin arXiv preprint arXiv:1710.09412, 2019 | 292 | 2019 |
Unsupervised and transfer learning challenge: a deep learning approach G Mesnil, Y Dauphin, X Glorot, S Rifai, Y Bengio, I Goodfellow, E Lavoie, ... Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 97-110, 2012 | 289 | 2012 |