The sound of pixels H Zhao, C Gan, A Rouditchenko, C Vondrick, J McDermott, A Torralba Proceedings of the European conference on computer vision (ECCV), 570-586, 2018 | 583 | 2018 |
Avlnet: Learning audio-visual language representations from instructional videos A Rouditchenko, A Boggust, D Harwath, B Chen, D Joshi, S Thomas, ... Proc. Interspeech 2021, 1584-1588, 2021 | 144 | 2021 |
Everything at once-multi-modal fusion transformer for video retrieval N Shvetsova, B Chen, A Rouditchenko, S Thomas, B Kingsbury, RS Feris, ... Proceedings of the ieee/cvf conference on computer vision and pattern …, 2022 | 142 | 2022 |
Self-supervised audio-visual co-segmentation A Rouditchenko, H Zhao, C Gan, J McDermott, A Torralba ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 131 | 2019 |
Contrastive audio-visual masked autoencoder Y Gong, A Rouditchenko, AH Liu, D Harwath, L Karlinsky, H Kuehne, ... arXiv preprint arXiv:2210.07839, 2022 | 104 | 2022 |
Multimodal clustering networks for self-supervised learning from unlabeled videos B Chen, A Rouditchenko, K Duarte, H Kuehne, S Thomas, A Boggust, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 85 | 2021 |
Cross-modal discrete representation learning AH Liu, SY Jin, CIJ Lai, A Rouditchenko, A Oliva, J Glass arXiv preprint arXiv:2106.05438, 2021 | 42 | 2021 |
Cmkd: Cnn/transformer-based cross-model knowledge distillation for audio classification Y Gong, S Khurana, A Rouditchenko, J Glass arXiv preprint arXiv:2203.06760, 2022 | 32 | 2022 |
Uavm: Towards unifying audio and visual models Y Gong, AH Liu, A Rouditchenko, J Glass IEEE Signal Processing Letters 29, 2437-2441, 2022 | 19* | 2022 |
Comparison of multilingual self-supervised and weakly-supervised speech pre-training for adaptation to unseen languages A Rouditchenko, S Khurana, S Thomas, R Feris, L Karlinsky, H Kuehne, ... arXiv preprint arXiv:2305.12606, 2023 | 11 | 2023 |
C2kd: Cross-lingual cross-modal knowledge distillation for multilingual text-video retrieval A Rouditchenko, YS Chuang, N Shvetsova, S Thomas, R Feris, ... ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 6 | 2023 |
Cascaded Multilingual Audio-Visual Learning from Videos A Rouditchenko, A Boggust, D Harwath, S Thomas, H Kuehne, B Chen, ... Proc. Interspeech 2021, 3006-3010, 2021 | 6 | 2021 |
Label-efficient audio classification through multitask learning and self-supervision T Lee, T Gong, S Padhy, A Rouditchenko, A Ndirango arXiv preprint arXiv:1910.12587, 2019 | 6 | 2019 |
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset I Palmer, A Rouditchenko, A Barbu, B Katz, J Glass Proc. Interspeech 2021, 3650-3654, 2021 | 5 | 2021 |
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions B Chen, N Shvetsova, A Rouditchenko, D Kondermann, S Thomas, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 3 | 2024 |
Av-cpl: Continuous pseudo-labeling for audio-visual speech recognition A Rouditchenko, R Collobert, T Likhomanenko arXiv preprint arXiv:2309.17395, 2023 | 1 | 2023 |
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation A Rouditchenko, Y Gong, S Thomas, L Karlinsky, H Kuehne, R Feris, ... arXiv preprint arXiv:2406.10082, 2024 | | 2024 |
Learning Audio-Video Language Representations A Rouditchenko Massachusetts Institute of Technology, 2021 | | 2021 |