Esc: Exploration with soft commonsense constraints for zero-shot object navigation K Zhou, K Zheng, C Pryor, Y Shen, H Jin, L Getoor, XE Wang International Conference on Machine Learning, 42829-42842, 2023 | 51 | 2023 |
Vlmbench: A compositional benchmark for vision-and-language manipulation K Zheng, X Chen, OC Jenkins, X Wang Advances in Neural Information Processing Systems 35, 665-678, 2022 | 44 | 2022 |
Minigpt-5: Interleaved vision-and-language generation via generative vokens K Zheng, X He, XE Wang arXiv preprint arXiv:2310.02239, 2023 | 43 | 2023 |
Jarvis: A neuro-symbolic commonsense reasoning framework for conversational embodied agents K Zheng, K Zhou, J Gu, Y Fan, J Wang, Z Di, X He, XE Wang arXiv preprint arXiv:2208.13266, 2022 | 17 | 2022 |
Manipulation-oriented object perception in clutter through affordance coordinate frames X Chen, K Zheng, Z Zeng, C Kisailus, S Basu, J Cooney, J Pavlasek, ... 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids …, 2022 | 4 | 2022 |
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests Y Fan, J Gu, K Zheng, XE Wang arXiv preprint arXiv:2305.14260, 2023 | 2 | 2023 |
Composable Causality in Semantic Robot Programming E Sheetz, X Chen, Z Zeng, K Zheng, Q Shi, OC Jenkins 2022 International Conference on Robotics and Automation (ICRA), 1380-1386, 2022 | 2 | 2022 |
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Y Zhou, R Zhang, K Zheng, N Zhao, J Gu, Z Wang, XE Wang, T Sun arXiv preprint arXiv:2406.09305, 2024 | 1 | 2024 |
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos X He, W Feng, K Zheng, Y Lu, W Zhu, J Li, Y Fan, J Wang, L Li, Z Yang, ... arXiv preprint arXiv:2406.08407, 2024 | | 2024 |
R2H: Building Multimodal Navigation Helpers that Respond to Help. Y Fan, K Zheng, J Gu, XE Wang CoRR, 2023 | | 2023 |
SlugJARVIS: Multimodal Commonsense Knowledge-based Embodied AI for SimBot Challenge J Gu, K Zheng, KZY Fan, XHJWZ Di, XE Wang | | |
VLMbench: A Benchmark for Vision-and-Language Manipulation K Zheng, X Chen, OC Jenkins, X Wang | | |