Xiaoxia (Shirley) Wu 吴晓霞个人学术档案

引用次数

	总计	2019 年至今
引用	1441	1414
h 指数	16	16
i10 指数	20	20

460

230

115

345

201820192020202120222023202421 60 113 158 215 421 444

开放获取的出版物数量

查看全部

3 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

关注

Xiaoxia (Shirley) Wu 吴晓霞

其他姓名Xiaoxia Wu

Microsoft

在 microsoft.com 的电子邮件经过验证 - 首页

Machine Learning Optimization


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Adagrad stepsizes: Sharp convergence over nonconvex landscapes R Ward, X Wu, L Bottou Journal of Machine Learning Research 21 (219), 1-30, 2020	326	2020
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers Z Yao, R Yazdani Aminabadi, M Zhang, X Wu, C Li, Y He Advances in Neural Information Processing Systems 35, 27168-27183, 2022	248	2022
When do curricula work? X Wu, E Dyer, B Neyshabur arXiv preprint arXiv:2012.03107, 2020	126	2020
Wngrad: Learn the learning rate in gradient descent X Wu, R Ward, L Bottou arXiv preprint arXiv:1803.02865, 2018	92	2018
Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization R Ward, X Wu, L Bottou arXiv preprint arXiv:1806.01811, 2018	89	2018
Global convergence of adaptive gradient methods for an over-parameterized neural network X Wu, SS Du, R Ward arXiv preprint arXiv:1902.07111, 2019	67	2019
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation Z Yao, X Wu, C Li, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023	57*	2023
Hierarchical learning for generation with long source sequences T Rohde, X Wu, Y Liu arXiv preprint arXiv:2104.07545, 2021	55	2021
Linear convergence of adaptive stochastic gradient descent Y Xie, X Wu, R Ward International conference on artificial intelligence and statistics, 1475-1485, 2020	53	2020
Choosing the sample with lowest loss makes sgd robust V Shah, X Wu, S Sanghavi International Conference on Artificial Intelligence and Statistics, 2120-2130, 2020	46	2020
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	41	2023
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He arXiv preprint arXiv:2301.12017, 2023	30*	2023
Value-at-Risk estimation with stochastic interest rate models for option-bond portfolios X Wang, D Xie, J Jiang, X Wu, J He Finance Research Letters 21, 10-20, 2017	29	2017
Implicit regularization and convergence for weight normalization X Wu, E Dobriban, T Ren, S Wu, Z Li, S Gunasekar, R Ward, Q Liu Advances in Neural Information Processing Systems 33, 2835-2847, 2020	24*	2020
Xtc: Extreme compression for pre-trained transformers made simple and efficient X Wu, Z Yao, M Zhang, C Li, Y He Advances in Neural Information Processing Systems 35, 3217-3231, 2022	22	2022
Zero++: Extremely efficient collective communication for giant model training G Wang, H Qin, SA Jacobs, C Holmes, S Rajbhandari, O Ruwase, F Yan, ... arXiv preprint arXiv:2306.10209, 2023	20	2023
Mlpruning: A multilevel structured pruning framework for transformer-based models Z Yao, L Ma, S Shen, K Keutzer, MW Mahoney arXiv preprint arXiv:2105.14636, 2021	14	2021
Renaissance: A survey into ai text-to-image generation in the era of large model F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang, Z Yao, X Wu, C Holmes, ... arXiv preprint arXiv:2309.00810, 2023	13	2023
Random-ltd: Random and layerwise token dropping brings efficient training for large-scale transformers Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He arXiv preprint arXiv:2211.11586, 2022	13	2022
Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats X Wu, Z Yao, Y He arXiv preprint arXiv:2307.09782, 2023	12	2023

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

引用