Transfer learning-based nonstationary traffic flow prediction using AdaRNN and DCORAL

L Zang, T Wang, B Zhang, C Li - Expert Systems with Applications, 2024 - Elsevier
Traffic flow prediction is an integral part of an intelligent transportation system (ITS) for
proactive transportation planning and management in public transit network systems …

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier
This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

PSscheduler: A parameter synchronization scheduling algorithm for distributed machine learning in reconfigurable optical networks

L Liu, X Xu, P Zhou, X Chen, D Ergu, H Yu, G Sun… - Neurocomputing, 2025 - Elsevier
With the increasing size of training datasets and models, parameter synchronization stage
puts a heavy burden on the network, and communication has become one of the main …

Speech Recognition for Intelligent System in Service Robots: A Review

R Atika, S Dwijayanti… - … Conference on Electrical …, 2024 - ieeexplore.ieee.org
Speech recognition and response system technology in service robot research continues to
evolve alongside technical advances and the increasing demand for intelligent automation …

ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages

MF Qharabagh, Z Dehghanian, HR Rabiee - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we introduce ManaTTS, the most extensive publicly accessible single-speaker
Persian corpus, and a comprehensive framework for collecting transcribed speech datasets …

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

JM Acosta-Triana, D Gimeno-Gómez… - arXiv preprint arXiv …, 2024 - arxiv.org
More than 7,000 known languages are spoken around the world. However, due to the lack
of annotated resources, only a small fraction of them are currently covered by speech …

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

J Peymanfard, V Saeedi, MR Mohammadi… - arXiv preprint arXiv …, 2023 - arxiv.org
Lip reading is a challenging task that has many potential applications in speech recognition,
human-computer interaction, and security systems. However, existing lip reading systems …

[PDF][PDF] Extending LIP-RTVE: Towards A Large-Scale Audio-Visual Dataset for Continuous Spanish in the Wild

M Zaragozá-Portolés, D Gimeno-Gómez… - Proc. IberSPEECH …, 2024 - isca-archive.org
This article presents the extension of the LIP-RTVE dataset, a dataset dedicated to the
Spanish language for advancing audiovisual speech technologies. The annotated corpus …

Audio-Visual Wake-up Word Spotting Under Noisy and Multi-person Scenarios

C Li, F Su, J Liu - International Conference on Pattern Recognition, 2024 - Springer
The existing audio-visual wake-up word spotting (AVWWS) methods assume that the audio
signal has been aligned with the lip movement video signal of a specific speaker in noisy …

Preserving Correlation in Multi-Modal Data: Challenges and Strategies

V Koria, N Bhatt, N Bhatt… - 2024 OPJU International …, 2024 - ieeexplore.ieee.org
With the exponential growth of social media, an immense volume of data is generated
across diverse modalities, each exhibiting distinct statistical properties. Despite these …