Maximum entropy inverse reinforcement learning.

S Teng, X Hu, P Deng, B Li, Y Li, Y Ai… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Intelligent vehicles (IVs) have gained worldwide attention due to their increased
convenience, safety advantages, and potential commercial value. Despite predictions of …

被引用次数：370 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning, reinforcement learning, and world models

Y Matsuo, Y LeCun, M Sahani, D Precup, D Silver… - Neural Networks, 2022 - Elsevier

Deep learning (DL) and reinforcement learning (RL) methods seem to be a part of
indispensable factors to achieve human-level or super-human AI systems. On the other …

被引用次数：340 相关文章所有 7 个版本

[PDF] ieee.org

End-to-end autonomous driving: Challenges and frontiers

L Chen, P Wu, K Chitta, B Jaeger… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …

被引用次数：216 相关文章所有 4 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：397 相关文章所有 6 个版本

[PDF] mlr.press

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press

We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

被引用次数：163 相关文章所有 8 个版本

A survey on trajectory-prediction methods for autonomous driving

Y Huang, J Du, Z Yang, Z Zhou… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

In order to drive safely in a dynamic environment, autonomous vehicles should be able to
predict the future states of traffic participants nearby, especially surrounding vehicles, similar …

被引用次数：430 相关文章所有 2 个版本

[PDF] arxiv.org

Eureka: Human-level reward design via coding large language models

YJ Ma, W Liang, G Wang, DA Huang, O Bastani… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …

被引用次数：251 相关文章所有 7 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4425 相关文章所有 2 个版本

[PDF] nowpublishers.com

Social interactions for autonomous driving: A review and perspectives

W Wang, L Wang, C Zhang, C Liu… - Foundations and Trends …, 2022 - nowpublishers.com

No human drives a car in a vacuum; she/he must negotiate with other road users to achieve
their goals in social traffic scenes. A rational human driver can interact with other road users …

被引用次数：136 相关文章所有 10 个版本

[PDF] sagepub.com

How to train your robot with deep reinforcement learning: lessons we have learned

J Ibarz, J Tan, C Finn, M Kalakrishnan… - … Journal of Robotics …, 2021 - journals.sagepub.com

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously
acquiring complex behaviors from low-level sensor observations. Although a large portion of …

被引用次数：655 相关文章所有 7 个版本