Multimodal procedural planning via dual text-image prompting

On the prospects of incorporating large language models (llms) in automated planning and scheduling (aps)

V Pallagani, BC Muppasani, K Roy, F Fabiano… - Proceedings of the …, 2024 - ojs.aaai.org

Abstract Automated Planning and Scheduling is among the growing areas in Artificial
Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

Large language models for robotics: Opportunities, challenges, and perspectives

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：1899 相关文章所有 4 个版本

[PDF] neurips.cc

Chameleon: Plug-and-play compositional reasoning with large language models

P Lu, B Peng, H Cheng, M Galley… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have achieved remarkable progress in solving various
natural language processing tasks due to emergent reasoning abilities. However, LLMs …

被引用次数：256 相关文章所有 10 个版本

[PDF] arxiv.org

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

被引用次数：241 相关文章所有 6 个版本

[PDF] arxiv.org

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

被引用次数：76 相关文章所有 3 个版本

[PDF] arxiv.org

Grounded decoding: Guiding text generation with grounded models for robot control

W Huang, F Xia, D Shah, D Driess, A Zeng, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent progress in large language models (LLMs) has demonstrated the ability to learn and
leverage Internet-scale knowledge through pre-training with autoregressive models …

被引用次数：60 相关文章

[PDF] neurips.cc

Grounded decoding: Guiding text generation with grounded models for embodied agents

W Huang, F Xia, D Shah, D Driess… - Advances in …, 2024 - proceedings.neurips.cc

Recent progress in large language models (LLMs) has demonstrated the ability to learn and
leverage Internet-scale knowledge through pre-training with autoregressive models …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Roboscript: Code generation for free-form manipulation tasks across real and simulation

J Chen, Y Mu, Q Yu, T Wei, S Wu, Z Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

Rapid progress in high-level task planning and code generation for open-world robot
manipulation has been witnessed in Embodied AI. However, previous studies put much …

被引用次数：3 相关文章所有 2 个版本

Every Problem, Every Step, All In Focus: Learning to Solve Vision-Language Problems with Integrated Attention

X Chen, J Yang, S Chen, L Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Integrating information from vision and language modalities has sparked interesting
applications in the fields of computer vision and natural language processing. Existing …

被引用次数：1 相关文章所有 6 个版本