How2: a large-scale dataset for multimodal language understanding R Sanabria, O Caglayan, S Palaskar, D Elliott, L Barrault, L Specia, ... arXiv preprint arXiv:1811.00347, 2018 | 267 | 2018 |
How2sign: a large-scale multimodal dataset for continuous american sign language A Duarte, S Palaskar, L Ventura, D Ghadiyaram, K DeHaan, F Metze, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 153 | 2021 |
Multimodal abstractive summarization for how2 videos S Palaskar, J Libovický, S Gella, F Metze arXiv preprint arXiv:1906.07901, 2019 | 102 | 2019 |
Asr error correction and domain adaptation using machine translation A Mani, S Palaskar, NV Meripo, S Konam, F Metze ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 86 | 2020 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “speaking rosetta” JSALT 2017 workshop O Scharenborg, L Besacier, A Black, M Hasegawa-Johnson, F Metze, ... 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 54* | 2018 |
End-to-end multimodal speech recognition S Palaskar, R Sanabria, F Metze 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 49 | 2018 |
Cmu sinbad’s submission for the dstc7 avsd challenge R Sanabria, S Palaskar, F Metze DSTC7 at AAAI2019 workshop 6, 2019 | 43 | 2019 |
Combining LSTM and latent topic modeling for mortality prediction Y Jo, L Lee, S Palaskar arXiv preprint arXiv:1709.02842, 2017 | 42 | 2017 |
Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results O Scharenborg, F Ciannella, S Palaskar, A Black, F Metze, L Ondel, ... Proc. Internat. Conference on Natural Language, Signal and Speech Processing …, 2017 | 39 | 2017 |
Towards understanding ASR error correction for medical conversations A Mani, S Palaskar, S Konam Proceedings of the first workshop on natural language processing for medical …, 2020 | 33 | 2020 |
Multimodal grounding for sequence-to-sequence speech recognition O Caglayan, R Sanabria, S Palaskar, L Barraul, F Metze ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 32 | 2019 |
Multimodal abstractive summarization for open-domain videos J Libovický, S Palaskar, S Gella, F Metze Visually Grounded Interaction and Language (ViGIL), 1-8, 2018 | 30 | 2018 |
Learned in speech recognition: Contextual acoustic word embeddings S Palaskar, V Raunak, F Metze ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 26 | 2019 |
Acoustic-to-word recognition with sequence-to-sequence models S Palaskar, F Metze 2018 IEEE Spoken Language Technology Workshop (SLT), 397-404, 2018 | 22 | 2018 |
End-to-end speech summarization using restricted self-attention R Sharma, S Palaskar, AW Black, F Metze ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 17 | 2022 |
Learning from multiview correlations in open-domain videos N Holzenberger, S Palaskar, P Madhyastha, F Metze, R Arora ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 15 | 2019 |
Multimodal Speech Summarization Through Semantic Concept Learning. S Palaskar, R Salakhutdinov, AW Black, F Metze Interspeech, 791-795, 2021 | 9 | 2021 |
Transfer learning for multimodal dialog S Palaskar, R Sanabria, F Metze Computer Speech & Language 64, 101093, 2020 | 8 | 2020 |
Speech summarization using restricted self-attention R Sharma, S Palaskar, AW Black, F Metze arXiv preprint arXiv:2110.06263, 2021 | 5 | 2021 |
Grounded sequence to sequence transduction L Specia, L Barrault, O Caglayan, A Duarte, D Elliott, S Gella, ... IEEE journal of selected topics in signal processing 14 (3), 577-591, 2020 | 5 | 2020 |