Transfer learning-based nonstationary traffic flow prediction using AdaRNN and DCORAL
L Zang, T Wang, B Zhang, C Li - Expert Systems with Applications, 2024 - Elsevier
Traffic flow prediction is an integral part of an intelligent transportation system (ITS) for
proactive transportation planning and management in public transit network systems …
proactive transportation planning and management in public transit network systems …
Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems
This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …
driver assistive systems. These systems necessitate ongoing interaction with drivers while …
PSscheduler: A parameter synchronization scheduling algorithm for distributed machine learning in reconfigurable optical networks
With the increasing size of training datasets and models, parameter synchronization stage
puts a heavy burden on the network, and communication has become one of the main …
puts a heavy burden on the network, and communication has become one of the main …
Speech Recognition for Intelligent System in Service Robots: A Review
R Atika, S Dwijayanti… - … Conference on Electrical …, 2024 - ieeexplore.ieee.org
Speech recognition and response system technology in service robot research continues to
evolve alongside technical advances and the increasing demand for intelligent automation …
evolve alongside technical advances and the increasing demand for intelligent automation …
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
MF Qharabagh, Z Dehghanian, HR Rabiee - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we introduce ManaTTS, the most extensive publicly accessible single-speaker
Persian corpus, and a comprehensive framework for collecting transcribed speech datasets …
Persian corpus, and a comprehensive framework for collecting transcribed speech datasets …
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
JM Acosta-Triana, D Gimeno-Gómez… - arXiv preprint arXiv …, 2024 - arxiv.org
More than 7,000 known languages are spoken around the world. However, due to the lack
of annotated resources, only a small fraction of them are currently covered by speech …
of annotated resources, only a small fraction of them are currently covered by speech …
Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Lip reading is a challenging task that has many potential applications in speech recognition,
human-computer interaction, and security systems. However, existing lip reading systems …
human-computer interaction, and security systems. However, existing lip reading systems …
[PDF][PDF] Extending LIP-RTVE: Towards A Large-Scale Audio-Visual Dataset for Continuous Spanish in the Wild
M Zaragozá-Portolés, D Gimeno-Gómez… - Proc. IberSPEECH …, 2024 - isca-archive.org
This article presents the extension of the LIP-RTVE dataset, a dataset dedicated to the
Spanish language for advancing audiovisual speech technologies. The annotated corpus …
Spanish language for advancing audiovisual speech technologies. The annotated corpus …
Audio-Visual Wake-up Word Spotting Under Noisy and Multi-person Scenarios
C Li, F Su, J Liu - International Conference on Pattern Recognition, 2024 - Springer
The existing audio-visual wake-up word spotting (AVWWS) methods assume that the audio
signal has been aligned with the lip movement video signal of a specific speaker in noisy …
signal has been aligned with the lip movement video signal of a specific speaker in noisy …
Preserving Correlation in Multi-Modal Data: Challenges and Strategies
With the exponential growth of social media, an immense volume of data is generated
across diverse modalities, each exhibiting distinct statistical properties. Despite these …
across diverse modalities, each exhibiting distinct statistical properties. Despite these …