关注
Tianhao Wu
标题
引用次数
引用次数
年份
Sanity-checking pruning methods: Random tickets can win the jackpot
J Su, Y Chen, T Cai, T Wu, R Gao, L Wang, JD Lee
Advances in neural information processing systems 33, 20390-20401, 2020
812020
Starling-7b: Improving llm helpfulness & harmlessness with rlaif
B Zhu, E Frick, T Wu, H Zhu, J Jiao
November, 2023
542023
Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment
T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao
arXiv preprint arXiv:2310.00212, 2023
232023
On reinforcement learning with adversarial corruption and its application to block mdp
T Wu, Y Yang, S Du, L Wang
International Conference on Machine Learning, 11296-11306, 2021
172021
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
152022
Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge
T Wu, W Yuan, O Golovneva, J Xu, Y Tian, J Jiao, J Weston, S Sukhbaatar
arXiv preprint arXiv:2407.19594, 2024
82024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica
arXiv preprint arXiv:2406.11939, 2024
52024
A reduction-based framework for sequential decision making with delayed feedback
Y Yang, H Zhong, T Wu, B Liu, L Wang, SS Du
Advances in Neural Information Processing Systems 36, 2024
42024
Statistical inference on multi-armed bandits with delayed feedback
L Shi, J Wang, T Wu
International Conference on Machine Learning, 31328-31352, 2023
42023
A reduction-based framework for conservative bandits and reinforcement learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
arXiv preprint arXiv:2106.11692, 2021
42021
Routellm: Learning to route llms with preference data
I Ong, A Almahairi, V Wu, WL Chiang, T Wu, JE Gonzalez, MW Kadous, ...
arXiv preprint arXiv:2406.18665, 2024
32024
A unified framework for conservative exploration
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
arXiv preprint arXiv:2106.11692, 2021
22021
Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao
2024
Starling-7B: Improving Helpfulness and Harmlessness with RLAIF
B Zhu, E Frick, T Wu, H Zhu, K Ganesan, WL Chiang, J Zhang, J Jiao
First Conference on Language Modeling, 0
Pairwise Proximal Policy Optimization: Language Model Alignment with Comparative RL
T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao
First Conference on Language Modeling, 0
系统目前无法执行此操作,请稍后再试。
文章 1–15