Transformer hawkes process S Zuo, H Jiang, Z Li, T Zhao, H Zha International Conference on Machine Learning, 11692-11702, 2020 | 279 | 2020 |
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach Y Yu, S Zuo, H Jiang, W Ren, T Zhao, C Zhang arXiv preprint arXiv:2010.07835, 2020 | 113 | 2020 |
Taming Sparsely Activated Transformer with Stochastic Experts S Zuo, X Liu, J Jiao, YJ Kim, H Hassan, R Zhang, T Zhao, J Gao arXiv preprint arXiv:2110.04260, 2021 | 83 | 2021 |
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance Q Zhang, S Zuo, C Liang, A Bukharin, P He, W Chen, T Zhao International Conference on Machine Learning, 26809-26823, 2022 | 55 | 2022 |
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization C Liang, S Zuo, M Chen, H Jiang, X Liu, P He, T Zhao, W Chen arXiv preprint arXiv:2105.12002, 2021 | 48 | 2021 |
Less is More: Task-aware Layer-wise Distillation for Language Model Compression C Liang, S Zuo, Q Zhang, P He, W Chen, T Zhao arXiv preprint arXiv:2210.01351, 2022 | 35 | 2022 |
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation S Zuo, Q Zhang, C Liang, P He, T Zhao, W Chen arXiv preprint arXiv:2204.07675, 2022 | 29 | 2022 |
Efficient Long Sequence Modeling via State Space Augmented Transformer S Zuo, X Liu, J Jiao, D Charles, E Manavoglu, T Zhao, J Gao arXiv preprint arXiv:2212.08136, 2022 | 27 | 2022 |
A Hypergradient Approach to Robust Regression without Correspondence Y Xie, Y Mao, S Zuo, H Xu, X Ye, T Zhao, H Zha arXiv preprint arXiv:2012.00123, 2020 | 15 | 2020 |
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models C Liang, H Jiang, S Zuo, P He, X Liu, J Gao, W Chen, T Zhao arXiv preprint arXiv:2202.02664, 2022 | 14 | 2022 |
Self-Training with Differentiable Teacher S Zuo, Y Yu, C Liang, H Jiang, S Er, C Zhang, T Zhao, H Zha arXiv preprint arXiv:2109.07049, 2021 | 14 | 2021 |
Adversarial training as stackelberg game: An unrolled optimization approach S Zuo, C Liang, H Jiang, X Liu, P He, J Gao, W Chen, T Zhao arXiv preprint arXiv:2104.04886, 2021 | 14* | 2021 |
Tensor maps for synchronizing heterogeneous shape collections Q Huang, Z Liang, H Wang, S Zuo, C Bajaj ACM Transactions on Graphics (TOG) 38 (4), 1-18, 2019 | 12 | 2019 |
Adversarially regularized policy learning guided by trajectory optimization Z Zhao, S Zuo, T Zhao, Y Zhao Learning for Dynamics and Control Conference, 844-857, 2022 | 11 | 2022 |
Shaohui Xi, Bing Yin, Chao Zhang, and Tuo Zhao. 2022. Context-Aware Query Rewriting for Improving Users’ Search Experience on E-commerce Websites S Zuo, Q Yin, H Jiang arXiv preprint arXiv:2209.07584, 2022 | 6 | 2022 |
Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms A Bukharin, Y Li, Y Yu, Q Zhang, Z Chen, S Zuo, C Zhang, S Zhang, ... arXiv preprint arXiv:2310.10810, 2023 | 5 | 2023 |
ARCH: Efficient Adversarial Regularized Training with Caching S Zuo, C Liang, H Jiang, P He, X Liu, J Gao, W Chen, T Zhao arXiv preprint arXiv:2109.07048, 2021 | 3 | 2021 |
SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process Z Li, Y Xu, S Zuo, H Jiang, C Zhang, T Zhao, H Zha | 2 | 2023 |
DiP-GNN: Discriminative Pre-Training of Graph Neural Networks S Zuo, H Jiang, Q Yin, X Tang, B Yin, T Zhao arXiv preprint arXiv:2209.07499, 2022 | 2 | 2022 |
Differentially Private Estimation of Hawkes Process S Zuo, T Liu, T Zhao, H Zha arXiv preprint arXiv:2209.07303, 2022 | 2 | 2022 |