Continual Pre-Training of Large Language Models: How to (re) warm your model? K Gupta, B Thérien, A Ibrahim, ML Richter, Q Anthony, E Belilovsky, I Rish, ... arXiv preprint arXiv:2308.04014, 2023 | 43 | 2023 |
Simple and scalable strategies to continually pre-train large language models A Ibrahim, B Thérien, K Gupta, ML Richter, Q Anthony, T Lesort, ... arXiv preprint arXiv:2403.08763, 2024 | 22 | 2024 |
Parametric scattering networks S Gauthier, B Thérien, L Alsene-Racicot, M Chaudhary, I Rish, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 22 | 2022 |
Out-of-distribution detection for lidar-based 3d object detection C Huang, V Abdelzad, CG Mannes, L Rowe, B Therien, R Salay, ... 2022 IEEE 25th International Conference on Intelligent Transportation …, 2022 | 16 | 2022 |
GPT-NeoX: Large Scale Autoregressive Language Modeling in Py-Torch, 9 2023 A Andonian, Q Anthony, S Biderman, S Black, P Gali, L Gao, E Hallahan, ... URL https://www. github. com/eleutherai/gpt-neox, 0 | 7 | |
Comparison of radiologists and deep learning for US grading of hepatic steatosis P Vianna, SI Calce, P Boustros, C Larocque-Rigney, L Patry-Beaudoin, ... Radiology 309 (1), e230659, 2023 | 6 | 2023 |
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval Benjamin Therien, Parsa Bagherzadeh, Sabine Bergler Proceedings of the 15th International Workshop on Semantic Evaluation …, 2021 | 4 | 2021 |
LO: Compute-Efficient Meta-Generalization of Learned Optimizers B Thérien, CÉ Joseph, B Knyazev, E Oyallon, I Rish, E Belilovsky arXiv preprint arXiv:2406.00153, 2024 | 1 | 2024 |
Object Re-Identification from Point Clouds B Thérien, C Huang, A Chow, K Czarnecki Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024 | | 2024 |
Can We Learn Communication-Efficient Optimizers? CÉ Joseph, B Thérien, A Moudgil, B Knyazev, E Belilovsky arXiv preprint arXiv:2312.02204, 2023 | | 2023 |
A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition L Rowe, B Thérien, K Czarnecki, H Zhang arXiv preprint arXiv:2210.02577, 2022 | | 2022 |
Learning Communication-Efficient Optimizers CÉ Joseph, B Thérien, A Moudgil, B Knyazev, E Belilovsky | | |
Learning Optimizers for Local SGD CÉ Joseph, B Thérien, A Moudgil, B Knyazev, E Belilovsky International Workshop on Federated Learning in the Age of Foundation Models …, 0 | | |