关注
Hiteshi Sharma
Hiteshi Sharma
在 microsoft.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Phi-3 technical report: A highly capable language model locally on your phone
M Abdin, SA Jacobs, AA Awan, J Aneja, A Awadallah, H Awadalla, ...
arXiv preprint arXiv:2404.14219, 2024
1502024
Model-free reinforcement learning in infinite-horizon average-reward markov decision processes
CY Wei, MJ Jahromi, H Luo, H Sharma, R Jain
International conference on machine learning, 10170-10180, 2020
1022020
Evaluating cognitive maps and planning in large language models with CogEval
I Momennejad, H Hasanbeig, F Vieira Frujeri, H Sharma, N Jojic, ...
Advances in Neural Information Processing Systems 36, 2024
332024
Fine-tuning language models with advantage-induced policy alignment
B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu, MI Jordan, J Jiao
arXiv preprint arXiv:2306.02231, 2023
272023
A universal empirical dynamic programming algorithm for continuous state MDPs
WB Haskell, R Jain, H Sharma, P Yu
IEEE Transactions on Automatic Control 65 (1), 115-129, 2019
202019
Approximate relative value learning for average-reward continuous state MDPs
H Sharma, M Jafarnia-Jahromi, R Jain
Uncertainty in Artificial Intelligence, 956-964, 2020
162020
An empirical relative value learning algorithm for non-parametric MDPs with continuous state space
H Sharma, R Jain, A Gupta
2019 18th European Control Conference (ECC), 1368-1373, 2019
132019
Language models can be logical solvers
J Feng, R Xu, J Hao, H Sharma, Y Shen, D Zhao, W Chen
arXiv preprint arXiv:2311.06158, 2023
92023
Evaluating cognitive maps in large language models with cogeval: No emergent planning
I Momennejad, H Hasanbeig, FV Frujeri, H Sharma, RO Ness, N Jojic, ...
Advances in neural information processing systems 37, 2023
92023
Randomized function fitting-based empirical value iteration
WB Haskell, P Yu, H Sharma, R Jain
2017 IEEE 56th Annual Conference on Decision and Control (CDC), 2467-2472, 2017
92017
Allure: A systematic protocol for auditing and improving llm-based evaluation of text using iterative in-context-learning
H Hasanbeig, H Sharma, L Betthauser, FV Frujeri, I Momennejad
arXiv preprint arXiv:2309.13701, 2023
82023
An empirical dynamic programming algorithm for continuous MDPs
WB Haskell, R Jain, H Sharma, P Yu
arXiv preprint arXiv:1709.07506, 2017
82017
An approximately optimal relative value learning algorithm for averaged MDPs with continuous states and actions
H Sharma, R Jain
2019 57th Annual Allerton Conference on Communication, Control, and …, 2019
72019
Optimal spectrum sensing for cognitive radio with imperfect detector
H Sharma, A Patel, SN Merchant, UB Desai
2014 IEEE 79th Vehicular Technology Conference (VTC Spring), 1-5, 2014
42014
ALLURE: auditing and improving llm-based evaluation of text using iterative in-context-learning
H Hasanbeig, H Sharma, L Betthauser, F Vieira Frujeri, I Momennejad
arXiv e-prints, arXiv: 2309.13701, 2023
32023
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
S Zhang, D Yu, H Sharma, Z Yang, S Wang, H Hassan, Z Wang
arXiv preprint arXiv:2405.19332, 2024
22024
Language Models can be Deductive Solvers
J Feng, R Xu, J Hao, H Sharma, Y Shen, D Zhao, W Chen
Findings of the Association for Computational Linguistics: NAACL 2024, 4026-4042, 2024
12024
Finite Time Guarantees for Continuous State MDPs with Generative Model
H Sharma, R Jain
2020 59th IEEE Conference on Decision and Control (CDC), 3617-3622, 2020
12020
Randomized Policy Learning for Continuous State and Action MDPs
H Sharma, R Jain
arXiv preprint arXiv:2006.04331, 2020
12020
Empirical algorithms for general stochastic systems with continuous states and actions
H Sharma, R Jain, W Haskell
2019 IEEE 58th Conference on Decision and Control (CDC), 6344-6349, 2019
12019
系统目前无法执行此操作,请稍后再试。
文章 1–20