The marginal value of adaptive gradient methods in machine learning AC Wilson, R Roelofs, M Stern, N Srebro, B Recht Advances in neural information processing systems 30, 2017 | 1264 | 2017 |
Adafactor: Adaptive learning rates with sublinear memory cost N Shazeer, M Stern International Conference on Machine Learning, 4596-4604, 2018 | 842 | 2018 |
Abstract syntax networks for code generation and semantic parsing M Rabinovich, M Stern, D Klein arXiv preprint arXiv:1704.07535, 2017 | 415 | 2017 |
Insertion transformer: Flexible sequence generation via insertion operations M Stern, W Chan, J Kiros, J Uszkoreit International Conference on Machine Learning, 5976-5985, 2019 | 248 | 2019 |
A minimal span-based neural constituency parser M Stern, J Andreas, D Klein arXiv preprint arXiv:1705.03919, 2017 | 225 | 2017 |
Stochastic cubic regularization for fast nonconvex optimization N Tripuraneni, M Stern, C Jin, J Regier, MI Jordan Advances in neural information processing systems 31, 2018 | 186 | 2018 |
Blockwise parallel decoding for deep autoregressive models M Stern, N Shazeer, J Uszkoreit Advances in Neural Information Processing Systems 31, 2018 | 127 | 2018 |
Kernel feature selection via conditional covariance minimization J Chen, M Stern, MJ Wainwright, MI Jordan Advances in neural information processing systems 30, 2017 | 113 | 2017 |
Imitation attacks and defenses for black-box machine translation systems E Wallace, M Stern, D Song arXiv preprint arXiv:2004.15015, 2020 | 106 | 2020 |
What's going on in neural constituency parsers? an analysis D Gaddy, M Stern, D Klein arXiv preprint arXiv:1804.07853, 2018 | 76 | 2018 |
Kermit: Generative insertion-based modeling for sequences W Chan, N Kitaev, K Guu, M Stern, J Uszkoreit arXiv preprint arXiv:1906.01604, 2019 | 71 | 2019 |
Dynamic posted-price mechanisms for the blockchain transaction-fee market MVX Ferreira, DJ Moroz, DC Parkes, M Stern Proceedings of the 3rd ACM Conference on Advances in Financial Technologies …, 2021 | 62 | 2021 |
Effective inference for generative neural parsing M Stern, D Fried, D Klein arXiv preprint arXiv:1707.08976, 2017 | 60 | 2017 |
Improving neural parsing by disentangling model combination and reranking effects D Fried, M Stern, D Klein arXiv preprint arXiv:1707.03058, 2017 | 42 | 2017 |
Towards end-to-end in-image neural machine translation E Mansimov, M Stern, M Chen, O Firat, J Uszkoreit, P Jain arXiv preprint arXiv:2010.10648, 2020 | 21 | 2020 |
Semantic scaffolds for pseudocode-to-code generation R Zhong, M Stern, D Klein arXiv preprint arXiv:2005.05927, 2020 | 17 | 2020 |
Insertion-deletion transformer L Ruis, M Stern, J Proskurnia, W Chan arXiv preprint arXiv:2001.05540, 2020 | 12 | 2020 |
An empirical study of generation order for machine translation W Chan, M Stern, J Kiros, J Uszkoreit arXiv preprint arXiv:1910.13437, 2019 | 8 | 2019 |
Interactive Assignments for Teaching Structured Neural NLP D Gaddy, D Fried, N Kitaev, M Stern, R Corona, J DeNero, D Klein Proceedings of the Fifth Workshop on Teaching NLP, 104-107, 2021 | 1 | 2021 |
Generating neural network outputs using insertion operations JD Uszkoreit, MT Stern, JR Kiros, W Chan US Patent 10,740,571, 2020 | 1 | 2020 |