LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …
different domains including computer vision and natural language processing. Speech …
Audio-visual neural syntax acquisition
We study phrase structure induction from visually-grounded speech. The core idea is to first
segment the speech waveform into sequences of word segments, and subsequently induce …
segment the speech waveform into sequences of word segments, and subsequently induce …
Cascading and direct approaches to unsupervised constituency parsing on spoken sentences
Past work on unsupervised parsing is constrained to written form. In this paper, we present
the first study on unsupervised spoken constituency parsing given unlabeled spoken …
the first study on unsupervised spoken constituency parsing given unlabeled spoken …
Learning Language Structures through Grounding
F Shi - arXiv preprint arXiv:2406.09662, 2024 - arxiv.org
Language is highly structured, with syntactic and semantic structures, to some extent,
agreed upon by speakers of the same language. With implicit or explicit awareness of such …
agreed upon by speakers of the same language. With implicit or explicit awareness of such …
Multi-source domain adaptation for dependency parsing via domain-aware feature generation
Y Li, Z Zhang, Y Xian, Z Yu, S Gao, C Mao… - International Journal of …, 2024 - Springer
With deep representation learning advances, supervised dependency parsing has achieved
a notable enhancement. However, when the training data is drawn from various predefined …
a notable enhancement. However, when the training data is drawn from various predefined …
Semantic Role Labeling from Chinese Speech via End-to-End Learning
Abstract Semantic Role Labeling (SRL), crucial for understanding semantic relationships in
sentences, has traditionally focused on text-based input. However, the increasing use of …
sentences, has traditionally focused on text-based input. However, the increasing use of …
Textless Dependency Parsing by Labeled Sequence Prediction
S Kando, Y Miyao, J Naradowsky… - arXiv preprint arXiv …, 2024 - arxiv.org
Traditional spoken language processing involves cascading an automatic speech
recognition (ASR) system into text processing models. In contrast," textless" methods …
recognition (ASR) system into text processing models. In contrast," textless" methods …
Wav2pos: Exploring syntactic analysis from audio for Highland Puebla Nahuatl
We describe an approach to part-of-speech tagging from audio with very little human-
annotated data, for Highland Puebla Nahuatl, a low-resource language of Mexico. While …
annotated data, for Highland Puebla Nahuatl, a low-resource language of Mexico. While …
Textless phrase structure induction from visually-grounded speech
We study phrase structure induction from visually-grounded speech without intermediate text
or text pre-trained models. The core idea is to first segment the speech waveform into …
or text pre-trained models. The core idea is to first segment the speech waveform into …
PROPICTO: Developing Speech‑to‑Pictograph Translation Systems to Enhance Communication Accessibility
L Ormaechea, P Bouillon… - … Conference of The …, 2023 - hal.univ-grenoble-alpes.fr
PROPICTO is a project funded by the French National Research Agency and the Swiss
National Science Foundation, that aims at creating Speech-to-Pictograph translation …
National Science Foundation, that aims at creating Speech-to-Pictograph translation …