Lxmert: Learning cross-modality encoder representations from transformers H Tan, M Bansal Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019 | 2437 | 2019 |
Unifying vision-and-language tasks via text generation J Cho, J Lei, H Tan, M Bansal International Conference on Machine Learning, 1931-1942, 2021 | 468 | 2021 |
How much can clip benefit vision-and-language tasks? S Shen, LH Li, H Tan, M Bansal, A Rohrbach, KW Chang, Z Yao, ... arXiv preprint arXiv:2107.06383, 2021 | 371 | 2021 |
Learning to navigate unseen environments: Back translation with environmental dropout H Tan, L Yu, M Bansal arXiv preprint arXiv:1904.04195, 2019 | 304 | 2019 |
A joint speaker-listener-reinforcer model for referring expressions L Yu, H Tan, M Bansal, TL Berg Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 294 | 2017 |
Vokenization: Improving language understanding with contextualized, visual-grounded supervision H Tan, M Bansal arXiv preprint arXiv:2010.06775, 2020 | 122 | 2020 |
Lrm: Large reconstruction model for single image to 3d Y Hong, K Zhang, J Gu, S Bi, Y Zhou, D Liu, F Liu, K Sunkavalli, T Bui, ... arXiv preprint arXiv:2311.04400, 2023 | 90 | 2023 |
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model J Li, H Tan, K Zhang, Z Xu, F Luan, Y Xu, Y Hong, K Sunkavalli, ... arXiv preprint arXiv:2311.06214, 2023 | 74 | 2023 |
Vimpac: Video pre-training via masked token prediction and contrastive learning H Tan, J Lei, T Wolf, M Bansal arXiv preprint arXiv:2106.11250, 2021 | 63 | 2021 |
Envedit: Environment editing for vision-and-language navigation J Li, H Tan, M Bansal Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 62 | 2022 |
Enabling robots to understand incomplete natural language instructions using commonsense reasoning H Chen, H Tan, A Kuntz, M Bansal, R Alterovitz 2020 IEEE International Conference on Robotics and Automation (ICRA), 1963-1969, 2020 | 53 | 2020 |
Diagnosing the environment bias in vision-and-language navigation Y Zhang, H Tan, M Bansal arXiv preprint arXiv:2005.03086, 2020 | 51 | 2020 |
Expressing visual relationships via language H Tan, F Dernoncourt, Z Lin, T Bui, M Bansal arXiv preprint arXiv:1906.07689, 2019 | 43 | 2019 |
Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model Y Xu, H Tan, F Luan, S Bi, P Wang, J Li, Z Shi, K Sunkavalli, G Wetzstein, ... arXiv preprint arXiv:2311.09217, 2023 | 42 | 2023 |
The curse of performance instability in analysis datasets: Consequences, source, and suggestions X Zhou, Y Nie, H Tan, M Bansal arXiv preprint arXiv:2004.13606, 2020 | 38 | 2020 |
An Effective Framework for Weakly-Supervised Phrase Grounding Q Wang, H Tan, S Shen, M Mahoney, Z Yao Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 35* | 2020 |
Improving cross-modal alignment in vision language navigation via syntactic information J Li, H Tan, M Bansal arXiv preprint arXiv:2104.09580, 2021 | 33 | 2021 |
Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction P Wang, H Tan, S Bi, Y Xu, F Luan, K Sunkavalli, W Wang, Z Xu, K Zhang arXiv preprint arXiv:2311.12024, 2023 | 27 | 2023 |
Vidlankd: Improving language understanding via video-distilled knowledge transfer Z Tang, J Cho, H Tan, M Bansal Advances in Neural Information Processing Systems 34, 24468-24481, 2021 | 27 | 2021 |
Scaling data generation in vision-and-language navigation Z Wang, J Li, Y Hong, Y Wang, Q Wu, M Bansal, S Gould, H Tan, Y Qiao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 23 | 2023 |