Generative adversarial imitation learning

S Teng, X Hu, P Deng, B Li, Y Li, Y Ai… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Intelligent vehicles (IVs) have gained worldwide attention due to their increased
convenience, safety advantages, and potential commercial value. Despite predictions of …

被引用次数：279 相关文章所有 5 个版本

[PDF] researchgate.net

Deep reinforcement learning in smart manufacturing: A review and prospects

C Li, P Zheng, Y Yin, B Wang, L Wang - CIRP Journal of Manufacturing …, 2023 - Elsevier

To facilitate the personalized smart manufacturing paradigm with cognitive automation
capabilities, Deep Reinforcement Learning (DRL) has attracted ever-increasing attention by …

被引用次数：135 相关文章所有 4 个版本

[PDF] neurips.cc

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

被引用次数：219 相关文章所有 6 个版本

[PDF] ieee.org

End-to-end autonomous driving: Challenges and frontiers

L Chen, P Wu, K Chitta, B Jaeger… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …

被引用次数：118 相关文章所有 4 个版本

A survey on trajectory-prediction methods for autonomous driving

Y Huang, J Du, Z Yang, Z Zhou… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

In order to drive safely in a dynamic environment, autonomous vehicles should be able to
predict the future states of traffic participants nearby, especially surrounding vehicles, similar …

被引用次数：333 相关文章所有 2 个版本

[PDF] mlr.press

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press

We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

被引用次数：124 相关文章所有 8 个版本

[PDF] thecvf.com

Dataset distillation by matching training trajectories

G Cazenavette, T Wang, A Torralba… - Proceedings of the …, 2022 - openaccess.thecvf.com

Dataset distillation is the task of synthesizing a small dataset such that a model trained on
the synthetic set will match the test accuracy of the model trained on the full dataset. In this …

被引用次数：288 相关文章所有 9 个版本

[PDF] arxiv.org

Eureka: Human-level reward design via coding large language models

YJ Ma, W Liang, G Wang, DA Huang, O Bastani… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …

被引用次数：145 相关文章所有 7 个版本

[PDF] researchgate.net

[PDF][PDF] 生成式对抗网络GAN 的研究进展与展望

王坤峰，苟超，段艳杰，林懿伦，郑心湖，王飞跃 - 自动化学报, 2017 - researchgate.net

摘要生成式对抗网络GAN (Generative adversarial networks) 目前已经成为人工智能学界一个
热门的研究方向. GAN 的基本思想源自博弈论的二人零和博弈, 由一个生成器和一个判别器构成 …

被引用次数：118 相关文章所有 6 个版本

[PDF] neurips.cc

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc

While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …

被引用次数：130 相关文章所有 6 个版本