Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster N Dey, G Gosal, ZC Chen, H Khachane, W Marshall, R Pathria, M Tom, ... arXiv preprint arXiv:2304.03208, 2023 | 48 | 2023 |
SlimPajama: A 627B token cleaned and deduplicated version of RedPajama D Soboleva, F Al-Khateeb, R Myers, JR Steeves, J Hestness, N Dey https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and …, 2023 | 46 | 2023 |
37,000 Human-Planned Robotic Grasps With Six Degrees of Freedom VR Osorio, R Iyengar, X Yao, P Bhattachan, A Ragobar, N Dey, B Tripp IEEE Robotics and Automation Letters 5 (2), 3346-3351, 2020 | 5 | 2020 |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model N Dey, D Soboleva, F Al-Khateeb, B Yang, R Pathria, H Khachane, ... arXiv preprint arXiv:2309.11568, 2023 | 2 | 2023 |
Position Interpolation Improves ALiBi Extrapolation F Al-Khateeb, N Dey, D Soboleva, J Hestness arXiv preprint arXiv:2310.13017, 2023 | 1 | 2023 |
Studying CNN representations through activation dimensionality reduction and visualization NS Dey University of Waterloo, 2021 | 1 | 2021 |
Sparse maximal update parameterization: A holistic approach to sparse training dynamics N Dey, S Bergsma, J Hestness arXiv preprint arXiv:2405.15743, 2024 | | 2024 |
Identifying and interpreting tuning dimensions in deep networks NS Dey, JE Taylor, BP Tripp, A Wong, GW Taylor NeurIPS 2020 Workshop on Shared Visual Representations in Human & Machine …, 2020 | | 2020 |