CTAL: Pre-training cross-modal transformer for audio-and-language representations

H Li, Y Kang, T Liu, W Ding, Z Liu - arXiv preprint arXiv:2109.00181, 2021 - arxiv.org
Existing audio-language task-specific predictive approaches focus on building complicated
late-fusion mechanisms. However, these models are facing challenges of overfitting with …

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

H Li, Y Kang, T Liu, W Ding, Z Liu - arXiv e-prints, 2021 - ui.adsabs.harvard.edu
Existing audio-language task-specific predictive approaches focus on building complicated
late-fusion mechanisms. However, these models are facing challenges of overfitting with …

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

H Li, W Ding, Y Kang, T Liu, Z Wu… - Proceedings of the 2021 …, 2021 - aclanthology.org
Existing audio-language task-specific predictive approaches focus on building complicated
late-fusion mechanisms. However, these models are facing challenges of overfitting with …

[PDF][PDF] CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

H Li, W Ding, Y Kang, T Liu, Z Wu, Z Liu - scholar.archive.org
Existing approaches for audio-language taskspecific prediction focus on building
complicated late-fusion mechanisms. However, these models face challenges of overfitting …