DINOv2: Learning Robust Visual Features without Supervision M Oquab, T Darcet, T Moutakanni, H Vo, M Szafraniec, V Khalidov, ... Transactions on Machine Learning Research Journal, 2023 | 1549* | 2023 |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning L Yu, B Shi, R Pasunuru, B Muller, O Golovneva, T Wang, A Babu, B Tang, ... arXiv preprint arXiv:2309.02591, 2023 | 88 | 2023 |
Demystifying CLIP Data H Xu, S Xie, XE Tan, PY Huang, R Howes, V Sharma, SW Li, G Ghosh, ... International Conference on Learning Representations, Vienna 2024, 2023 | 71 | 2023 |
Mavil: Masked audio-video learners PY Huang, V Sharma, H Xu, C Ryali, Y Li, SW Li, G Ghosh, J Malik, ... Advances in Neural Information Processing Systems 36, 2024 | 45 | 2024 |
Attend and attack: Attention guided adversarial attacks on visual question answering models V Sharma, A Kalra, SC Vaibhav, L Patel, LP Morency Proc. 32nd Conf. Neural Inf. Process. Syst.(NeurIPS), 1-6, 2018 | 23 | 2018 |
Alexa arena: A user-centric interactive platform for embodied ai Q Gao, G Thattai, S Shakiah, X Gao, S Pansare, V Sharma, G Sukhatme, ... Advances in Neural Information Processing Systems 36, 2024 | 20 | 2024 |
Multimodal behavioral markers exploring suicidal intent in social media videos AP Shah, V Vaibhav, V Sharma, M Al Ismail, J Girard, LP Morency 2019 International Conference on Multimodal Interaction, 409-413, 2019 | 19 | 2019 |
BioAMA: towards an end to end biomedical question answering system V Sharma, N Kulkarni, S Pranavi, G Bayomi, E Nyberg, T Mitamura Proceedings of the BioNLP 2018 workshop, 109-117, 2018 | 19 | 2018 |
Chameleon: Mixed-Modal Early-Fusion Foundation Models C Team arXiv preprint arXiv:2405.09818, 2024 | 14 | 2024 |
Community regularization of visually-grounded dialog A Agarwal, G Swaminathan, V Sharma, M Lewis, K Sycara Proceedings of 18th International Conference on Autonomous Agents and …, 2019 | 14 | 2019 |
Analyzing newspaper crime reports for identification of safe transit paths V Sharma, R Kulshreshtha, P Singh, N Agrawal, A Kumar Proceedings of the 2015 Conference of the North American Chapter of the …, 2015 | 13 | 2015 |
An Introduction to Vision-Language Modeling F Bordes, RY Pang, A Ajay, AC Li, A Bardes, S Petryk, O Mañas, Z Lin, ... arXiv preprint arXiv:2405.17247, 2024 | 11 | 2024 |
Automatic tagging and retrieval of E-Commerce products based on visual features V Sharma, H Karnick Proceedings of the NAACL Student Research Workshop, 22-28, 2016 | 10 | 2016 |
Image summarization using topic modelling V Sharma, A Kumar, N Agrawal, P Singh, R Kulshreshtha 2015 IEEE International Conference on Signal and Image Processing …, 2015 | 10 | 2015 |
Segmentation guided attention networks for visual question answering V Sharma, A Bishnu, L Patel Proceedings of ACL 2017, Student Research Workshop, 43-48, 2017 | 9 | 2017 |
FLAP: Fast Language-Audio Pre-training CF Yeh, PY Huang, V Sharma, SW Li, G Gosh Proceedings of IEEE Automatic Speech Recognition and Understanding 2023, 2023 | 6 | 2023 |
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions J Urbanek, F Bordes, P Astolfi, M Williamson, V Sharma, ... The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, 2023 | 6 | 2023 |
Alexa, play with robot: Introducing the first Alexa Prize SimBot Challenge on embodied AI H Shi, L Ball, G Thattai, D Zhang, L Hu, Q Gao, S Shakiah, X Gao, ... arXiv preprint arXiv:2308.05221, 2023 | 5 | 2023 |
Tweet Based Reach Aware Temporal Attention Network for NFT Valuation R Sawhney, M Thakkar, R Soun, A Neerkaje, V Sharma, D Guhathakurta, ... Findings of the Association for Computational Linguistics: EMNLP 2022, 6321-6332, 2022 | 5 | 2022 |
Cyclegen: Cyclic consistency based product review generator from attributes V Sharma, HV Sharma, A Bishnu, L Patel Proceedings of the 11th International Conference on Natural Language …, 2018 | 5 | 2018 |