A review of vector quantization techniques

A Défossez, J Copet, G Synnaeve, Y Adi - arXiv preprint arXiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

被引用次数：506 相关文章所有 3 个版本

[PDF] arxiv.org

Soundstream: An end-to-end neural audio codec

N Zeghidour, A Luebs, A Omran… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

We present SoundStream, a novel neural audio codec that can efficiently compress speech,
music and general audio at bitrates normally targeted by speech-tailored codecs …

被引用次数：543 相关文章所有 5 个版本

[PDF] thecvf.com

From audio to photoreal embodiment: Synthesizing humans in conversations

E Ng, J Romero, T Bagautdinov, S Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present a framework for generating full-bodied photorealistic avatars that gesture
according to the conversational dynamics of a dyadic interaction. Given speech audio we …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Hifi-codec: Group-residual vector quantization for high fidelity audio codec

D Yang, S Liu, R Huang, J Tian, C Weng… - arXiv preprint arXiv …, 2023 - arxiv.org

Audio codec models are widely used in audio communication as a crucial technique for
compressing audio into discrete representations. Nowadays, audio codec models are …

被引用次数：79 相关文章所有 2 个版本

kNN Classification: a review

PK Syriopoulos, NG Kalampalikis, SB Kotsiantis… - Annals of Mathematics …, 2023 - Springer

The k-nearest neighbors (k/NN) algorithm is a simple yet powerful non-parametric classifier
that is robust to noisy data and easy to implement. However, with the growing literature on …

被引用次数：14 相关文章

[PDF] arxiv.org

Lauragpt: Listen, attend, understand, and regenerate audio with gpt

Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative Pre-trained Transformer (GPT) models have achieved remarkable performance
on various natural language processing tasks, and have shown great potential as …

被引用次数：31 相关文章

[PDF] thecvf.com

Audio-visual speech codecs: Rethinking audio-visual speech enhancement by re-synthesis

K Yang, D Marković, S Krenn… - Proceedings of the …, 2022 - openaccess.thecvf.com

Since facial actions such as lip movements contain significant information about speech
content, it is not surprising that audio-visual speech enhancement methods are more …

被引用次数：38 相关文章所有 5 个版本

[PDF] thecvf.com

Av-rir: Audio-visual room impulse response estimation

A Ratnarajah, S Ghosh, S Kumar… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Accurate estimation of Room Impulse Response (RIR) which captures an
environment's acoustic properties is important for speech processing and AR/VR …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Make-a-voice: Unified voice synthesis with discrete representation

R Huang, C Zhang, Y Wang, D Yang, L Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Various applications of voice synthesis have been developed independently despite the fact
that they generate" voice" as output in common. In addition, the majority of voice synthesis …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Behavior generation with latent actions

S Lee, Y Wang, H Etukuru, HJ Kim… - arXiv preprint arXiv …, 2024 - arxiv.org

Generative modeling of complex behaviors from labeled datasets has been a longstanding
problem in decision making. Unlike language or image generation, decision making …

被引用次数：17 相关文章所有 3 个版本