UNITER: UNiversal Image-TExt Representation Learning YC Chen*, L Li*, L Yu*, A El Kholy, F Ahmed, Z Gan, Y Cheng, J Liu ECCV, 2020 | 2503* | 2020 |
Modeling context in referring expressions L Yu, P Poirson, S Yang, AC Berg, TL Berg Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The …, 2016 | 1132 | 2016 |
Mattnet: Modular attention network for referring expression comprehension L Yu, Z Lin, X Shen, J Yang, X Lu, M Bansal, TL Berg Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 854 | 2018 |
Tvqa: Localized, compositional video question answering J Lei, L Yu, M Bansal, TL Berg arXiv preprint arXiv:1809.01696, 2018 | 654 | 2018 |
Hero: Hierarchical encoder for video+ language omni-representation pre-training L Li, YC Chen, Y Cheng, Z Gan, L Yu, J Liu arXiv preprint arXiv:2005.00200, 2020 | 520 | 2020 |
Learning to navigate unseen environments: Back translation with environmental dropout H Tan, L Yu, M Bansal arXiv preprint arXiv:1904.04195, 2019 | 326 | 2019 |
Visual madlibs: Fill in the blank description generation and question answering L Yu, E Park, AC Berg, TL Berg Proceedings of the ieee international conference on computer vision, 2461-2469, 2015 | 306* | 2015 |
A joint speaker-listener-reinforcer model for referring expressions L Yu, H Tan, M Bansal, TL Berg Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 297 | 2017 |
Tvr: A large-scale dataset for video-subtitle moment retrieval J Lei, L Yu, TL Berg, M Bansal Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 260 | 2020 |
Tvqa+: Spatio-temporal grounding for video question answering J Lei, L Yu, TL Berg, M Bansal arXiv preprint arXiv:1904.11574, 2019 | 241 | 2019 |
Vector sparse representation of color image using quaternion matrix analysis Y Xu, L Yu, H Xu, H Zhang, T Nguyen IEEE Transactions on image processing 24 (4), 1315-1329, 2015 | 155 | 2015 |
Physics-inspired garment recovery from a single-view image S Yang, Z Pan, T Amert, K Wang, L Yu, T Berg, MC Lin ACM Transactions on Graphics (TOG) 37 (5), 1-14, 2018 | 152* | 2018 |
Behind the scene: Revealing the secrets of pre-trained vision-and-language models J Cao, Z Gan, Y Cheng, L Yu, YC Chen, J Liu Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 149 | 2020 |
Value: A multi-task benchmark for video-and-language understanding evaluation L Li, J Lei, Z Gan, L Yu, YC Chen, R Pillai, Y Cheng, L Zhou, XE Wang, ... arXiv preprint arXiv:2106.04632, 2021 | 107 | 2021 |
Multi-target embodied question answering L Yu, X Chen, G Gkioxari, M Bansal, TL Berg, D Batra Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 103 | 2019 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 97 | 2024 |
Hierarchically-attentive rnn for album summarization and storytelling L Yu, M Bansal, TL Berg arXiv preprint arXiv:1708.02977, 2017 | 81 | 2017 |
Violin: A large-scale dataset for video-and-language inference J Liu, W Chen, Y Cheng, Z Gan, L Yu, Y Yang, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 72 | 2020 |
What is more likely to happen next? video-and-language future event prediction J Lei, L Yu, TL Berg, M Bansal arXiv preprint arXiv:2010.07999, 2020 | 63 | 2020 |
Bachgan: High-resolution image synthesis from salient object layout Y Li, Y Cheng, Z Gan, L Yu, L Wang, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 50 | 2020 |