GPT-4V (ision) is a Generalist Web Agent, if Grounded B Zheng, B Gou, J Kil, H Sun, Y Su International Conference on Machine Learning (ICML), 2024 | 64 | 2024 |
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones CH Song, J Kil, TY Pan, BM Sadler, WL Chao, Y Su IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 | 24 | 2022 |
PreSTU: Pre-Training for Scene-Text Understanding J Kil, S Changpinyo, X Chen, H Hu, S Goodman, WL Chao, R Soricut IEEE/CVF International Conference on Computer Vision (ICCV), 2023 | 21 | 2023 |
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering J Kil, C Zhang, D Xuan, WL Chao Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021 | 20 | 2021 |
Revisiting Document Representations for Large-Scale Zero-Shot Learning J Kil, WL Chao NAACL, 2021 | 7 | 2021 |
Dual-View Visual Contextualization for Web Navigation J Kil, CH Song, B Zheng, X Deng, Y Su, WL Chao IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024 | 3 | 2024 |
CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs J Kil, Z Mai, J Lee, Z Wang, K Cheng, L Wang, Y Liu, A Chowdhury, ... arXiv preprint arXiv:2407.16837, 2024 | 1 | 2024 |
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback JS Byun, J Chun, J Kil, A Perrault arXiv preprint arXiv:2407.00087, 2024 | | 2024 |
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering J Kil, F Tavazoee, D Kang, JK Kim Annual Meeting of the Association for Computational Linguistics (ACL), Findings, 2024 | | 2024 |