Inferring causal molecular networks: empirical assessment through a community-based effort SM Hill, LM Heiser, T Cokelaer, M Unger, NK Nesser, DE Carlin, Y Zhang, ... Nature methods 13 (4), 310-318, 2016 | 247 | 2016 |
Escaping saddles with stochastic gradients H Daneshmand, J Kohler, A Lucchi, T Hofmann International Conference on Machine Learning, 1155-1164, 2018 | 164 | 2018 |
Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization J Kohler, H Daneshmand, A Lucchi, T Hofmann, M Zhou, K Neymeyr The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 148* | 2019 |
Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm H Daneshmand, M Gomez-Rodriguez, L Song, B Schoelkopf International conference on machine learning, 793-801, 2014 | 139 | 2014 |
Local saddle point optimization: A curvature exploitation approach L Adolphs, H Daneshmand, A Lucchi, T Hofmann The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 126 | 2019 |
Transformers learn to implement preconditioned gradient descent for in-context learning K Ahn, X Cheng, H Daneshmand, S Sra Advances in Neural Information Processing Systems 36, 2024 | 89 | 2024 |
Batch normalization provably avoids ranks collapse for randomly initialised deep networks H Daneshmand, J Kohler, F Bach, T Hofmann, A Lucchi Advances in Neural Information Processing Systems 33, 18387-18398, 2020 | 61 | 2020 |
Starting small-learning with adaptive sample sizes H Daneshmand, A Lucchi, T Hofmann International conference on machine learning, 1463-1471, 2016 | 51 | 2016 |
Adaptive newton method for empirical risk minimization to statistical accuracy A Mokhtari, H Daneshmand, A Lucchi, T Hofmann, A Ribeiro Advances in Neural Information Processing Systems 29, 2016 | 51* | 2016 |
Estimating diffusion networks: Recovery conditions, sample complexity and soft-thresholding algorithm M Gomez-Rodriguez, L Song, H Daneshm, B Schölkopf Journal of Machine Learning Research 17 (90), 1-29, 2016 | 43* | 2016 |
Batch normalization orthogonalizes representations in deep random networks H Daneshmand, A Joudaki, F Bach Advances in Neural Information Processing Systems 34, 4896-4906, 2021 | 32 | 2021 |
A time-aware recommender system based on dependency network of items SM Daneshmand, A Javari, SE Abtahi, M Jalili The Computer Journal 58 (9), 1955-1966, 2015 | 24 | 2015 |
Revisiting the role of euler numerical integration on acceleration and stability in convex optimization P Zhang, A Orvieto, H Daneshmand, T Hofmann, RS Smith International Conference on Artificial Intelligence and Statistics, 3979-3987, 2021 | 10 | 2021 |
On the impact of activation and normalization in obtaining isometric embeddings at initialization A Joudaki, H Daneshmand, F Bach Advances in Neural Information Processing Systems 36, 39855-39875, 2023 | 5 | 2023 |
Efficient displacement convex optimization with particle gradient descent H Daneshmand, JD Lee, C Jin International Conference on Machine Learning, 6836-6854, 2023 | 4 | 2023 |
Rethinking the variational interpretation of accelerated optimization methods P Zhang, A Orvieto, H Daneshmand Advances in Neural Information Processing Systems 34, 14396-14406, 2021 | 4* | 2021 |
Towards training without depth limits: Batch normalization without gradient explosion A Meterez, A Joudaki, F Orabona, A Immer, G Rätsch, H Daneshmand arXiv preprint arXiv:2310.02012, 2023 | 3 | 2023 |
On bridging the gap between mean field and finite width deep random multilayer perceptron with batch normalization A Joudaki, H Daneshmand, F Bach International Conference on Machine Learning, 15388-15400, 2023 | 2 | 2023 |
Polynomial-time sparse measure recovery H Daneshmand, F Bach arXiv preprint arXiv:2204.07879, 2022 | 2 | 2022 |
Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning J Wang, E Blaser, H Daneshmand, S Zhang arXiv preprint arXiv:2405.13861, 2024 | 1 | 2024 |