关注
Puyuan  Peng
Puyuan Peng
Research Intern, Meta; PhD student, The University of Texas at Austin
在 utexas.edu 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
A Baade, P Peng, D Harwath
Interspeech 2022, 2022
932022
Word discovery in visually grounded, self-supervised speech models
P Peng, D Harwath
Interspeech 2022, 2022
392022
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
P Peng, B Yan, S Watanabe, D Harwath
Interspeech 2023, 2023
352023
Fast-slow transformer for visually grounding speech
P Peng, D Harwath
ICASSP 2022, 2022
322022
Self-supervised representation learning for speech using visual grounding and masked language modeling
P Peng, D Harwath
AAAI 2022 SAS Workshop, 2022
282022
A correspondence variational autoencoder for unsupervised acoustic word embeddings
P Peng, H Kamper, K Livescu
NeurIPS 2020 SAS Workshop, 2020
172020
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
P Peng, PY Huang, D Li, A Mohamed, D Harwath
arXiv preprint arXiv:2403.16973, 2024
102024
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Y Tseng, L Berry*, YT Chen*, I Chiu*, HH Lin*, M Liu*, P Peng*, YJ Shih*, ...
preprint, 2023
62023
Syllable segmentation and cross-lingual generalization in a visually grounded, self-supervised speech model
P Peng, SW Li, AM Okko Räsänen, D Harwath
Interspeech, 2023
52023
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
P Peng, SW Li, O Räsänen, A Mohamed, D Harwath
Interspeech 2023, 2023
42023
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Z Zheng, P Peng, Z Ma, X Chen, E Choi, D Harwath
arXiv preprint arXiv:2402.01591, 2024
32024
Audio-Visual Neural Syntax Acquisition
CIJ Lai*, F Shi*, P Peng*, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...
ASRU 2023, 2023
32023
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ...
arXiv preprint arXiv:2402.06959, 2024
22024
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
C Hori, P Peng, D Harwath, X Liu, K Ota, S Jain, R Corcodel, D Jha, ...
Interspeech 2023, 2023
22023
Zero-shot Video Moment Retrieval With Off-the-Shelf Models
A Diwan*, P Peng*, RJ Mooney (* denotes equal contribution)
NeurIPS 2022 TL4NLP, 2022
22022
Textless phrase structure induction from visually-grounded speech
CI Lai, F Shi, P Peng, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...
12023
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
C Chen, P Peng, A Baid, Z Xue, WN Hsu, D Harwarth, K Grauman
arXiv preprint arXiv:2406.09272, 2024
2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath
arXiv preprint arXiv:2402.05819, 2024
2024
Neural Codec Language Models for Disentangled and Textless Voice Conversion
A Baade, P Peng, D Harwath
系统目前无法执行此操作,请稍后再试。
文章 1–19