LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …

Audio-visual neural syntax acquisition

CIJ Lai, F Shi, P Peng, Y Kim, K Gimpel… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We study phrase structure induction from visually-grounded speech. The core idea is to first
segment the speech waveform into sequences of word segments, and subsequently induce …

Cascading and direct approaches to unsupervised constituency parsing on spoken sentences

Y Tseng, CIJ Lai, H Lee - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Past work on unsupervised parsing is constrained to written form. In this paper, we present
the first study on unsupervised spoken constituency parsing given unlabeled spoken …

Learning Language Structures through Grounding

F Shi - arXiv preprint arXiv:2406.09662, 2024 - arxiv.org
Language is highly structured, with syntactic and semantic structures, to some extent,
agreed upon by speakers of the same language. With implicit or explicit awareness of such …

Multi-source domain adaptation for dependency parsing via domain-aware feature generation

Y Li, Z Zhang, Y Xian, Z Yu, S Gao, C Mao… - International Journal of …, 2024 - Springer
With deep representation learning advances, supervised dependency parsing has achieved
a notable enhancement. However, when the training data is drawn from various predefined …

Semantic Role Labeling from Chinese Speech via End-to-End Learning

H Chen, X Li, M Zhang, M Zhang - Findings of the Association for …, 2024 - aclanthology.org
Abstract Semantic Role Labeling (SRL), crucial for understanding semantic relationships in
sentences, has traditionally focused on text-based input. However, the increasing use of …

Textless Dependency Parsing by Labeled Sequence Prediction

S Kando, Y Miyao, J Naradowsky… - arXiv preprint arXiv …, 2024 - arxiv.org
Traditional spoken language processing involves cascading an automatic speech
recognition (ASR) system into text processing models. In contrast," textless" methods …

Wav2pos: Exploring syntactic analysis from audio for Highland Puebla Nahuatl

R Pugh, V Sreedhar, F Tyers - … of the 4th Workshop on Natural …, 2024 - aclanthology.org
We describe an approach to part-of-speech tagging from audio with very little human-
annotated data, for Highland Puebla Nahuatl, a low-resource language of Mexico. While …

Textless phrase structure induction from visually-grounded speech

CI Lai, F Shi, P Peng, Y Kim, K Gimpel, S Chang… - 2023 - openreview.net
We study phrase structure induction from visually-grounded speech without intermediate text
or text pre-trained models. The core idea is to first segment the speech waveform into …

PROPICTO: Developing Speech‑to‑Pictograph Translation Systems to Enhance Communication Accessibility

L Ormaechea, P Bouillon… - … Conference of The …, 2023 - hal.univ-grenoble-alpes.fr
PROPICTO is a project funded by the French National Research Agency and the Swiss
National Science Foundation, that aims at creating Speech-to-Pictograph translation …