Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023 | 4056 | 2023 |
Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 3430* | 2023 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ... Journal of Machine Learning Research 25 (70), 1-53, 2024 | 2186 | 2024 |
Deep Graph Infomax. P Velickovic, W Fedus, WL Hamilton, P Liò, Y Bengio, RD Hjelm ICLR (Poster) 2 (3), 4, 2019 | 2142 | 2019 |
Emergent abilities of large language models J Wei, Y Tay, R Bommasani, C Raffel, B Zoph, S Borgeaud, D Yogatama, ... arXiv preprint arXiv:2206.07682, 2022 | 1848 | 2022 |
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer Journal of Machine Learning Research 23 (120), 1-39, 2022 | 1528 | 2022 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 881 | 2022 |
Deep graph infomax P Veličković, W Fedus, WL Hamilton, P Liò, Y Bengio, RD Hjelm arXiv preprint arXiv:1809.10341, 2018 | 717 | 2018 |
MaskGAN: Better Text Generation via Filling in the ______ W Fedus, I Goodfellow, AM Dai International Conference on Learning Representations (ICLR 2018), 2018 | 611 | 2018 |
In silico labeling: Predicting fluorescent labels in unlabeled images SF Eric Christiansen, Samuel J. Yang, D. Michael Ando, Ashkan Javaherian ... Cell, 2018 | 606 | 2018 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International Conference on Machine Learning, 5547-5569, 2022 | 422 | 2022 |
Revisiting resnets: Improved training and scaling strategies I Bello, W Fedus, X Du, ED Cubuk, A Srinivas, TY Lin, J Shlens, B Zoph Advances in Neural Information Processing Systems 34, 22614-22627, 2021 | 320 | 2021 |
Revisiting fundamentals of experience replay W Fedus, P Ramachandran, R Agarwal, Y Bengio, H Larochelle, ... International conference on machine learning, 3061-3071, 2020 | 282 | 2020 |
Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step W Fedus, M Rosca, B Lakshminarayanan, AM Dai, S Mohamed, ... International Conference on Learning Representations (ICLR 2018), 2017 | 257 | 2017 |
The case for a directional dark matter detector and the status of current experimental efforts S Ahlen, N Afshordi, JBR Battat, J Billard, N Bozorgnia, S Burgos, ... International Journal of Modern Physics A 25 (01), 1-51, 2010 | 248 | 2010 |
Language GANs Falling Short M Caccia, L Caccia, W Fedus, H Larochelle, J Pineau, L Charlin International Conference on Learning Representations (ICLR 2020), 2018 | 233 | 2018 |
ChatGPT: Optimizing language models for dialogue J Schulman, B Zoph, C Kim, J Hilton, J Menick, J Weng, JFC Uribe, ... OpenAI blog 2 (4), 2022 | 197 | 2022 |
Toju Duke, Lucas Dixon, Kun Zhang, Quoc V N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... Le, Yonghui Wu, Zhifeng Chen, and Claire Cui, 2021 | 137 | 2021 |
Do transformer modifications transfer across implementations and applications? S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ... arXiv preprint arXiv:2102.11972, 2021 | 118* | 2021 |
Hyperbolic discounting and learning over multiple horizons W Fedus, C Gelada, Y Bengio, MG Bellemare, H Larochelle Reinforcement Learning and Decision Making (RLDM 2019), 2019 | 118 | 2019 |