Puyuan Peng 个人学术档案 - 学术资源搜索

引用次数

	总计	2019 年至今
引用	282	282
h 指数	7	7
i10 指数	7	7

160

120

202020212022202320241 2 28 104 143

合著作者

David HarwathThe University of Texas at Austin在 utexas.edu 的电子邮件经过验证
Karen LivescuTTI-Chicago在 ttic.edu 的电子邮件经过验证
Shinji WatanabeCarnegie Mellon University在 cmu.edu 的电子邮件经过验证
Brian YanCarnegie Mellon University在 cs.cmu.edu 的电子邮件经过验证
Shang-Wen Daniel LiFAIR - Research manager在 fb.com 的电子邮件经过验证
Herman KamperStellenbosch University在 sun.ac.za 的电子邮件经过验证
Cheng-I Jeff LaiMassachusetts Institute of Technology在 mit.edu 的电子邮件经过验证
Freda ShiAssistant Professor of Computer Science, University of Waterloo在 uwaterloo.ca 的电子邮件经过验证
James GlassMIT Computer Science and Artificial Intelligence Laboratory在 mit.edu 的电子邮件经过验证
Kevin GimpelQuillBot在 ttic.edu 的电子邮件经过验证
Shiyu ChangUniversity of California, Santa Barbara在 cs.ucsb.edu 的电子邮件经过验证
David CoxVP, AI Models; IBM Director, MIT-IBM Watson AI Lab, IBM Research在 ibm.com 的电子邮件经过验证
Raymond MooneyProfessor of Computer Science, University of Texas at Austin在 cs.utexas.edu 的电子邮件经过验证
Jonathan Le RouxMERL在 merl.com 的电子邮件经过验证
Chiori HoriMERL在 merl.com 的电子邮件经过验证
Abdelrahman MohamedResearch scientist, Facebook AI Research在 fb.com 的电子邮件经过验证

关注

Puyuan Peng

Research Intern, Meta; PhD student, The University of Texas at Austin

在 utexas.edu 的电子邮件经过验证 - 首页

Speech Processing Multimodal Learning Computer Vision Artificial Intelligence


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer A Baade, P Peng, D Harwath Interspeech 2022, 2022	93	2022
Word discovery in visually grounded, self-supervised speech models P Peng, D Harwath Interspeech 2022, 2022	39	2022
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization P Peng, B Yan, S Watanabe, D Harwath Interspeech 2023, 2023	35	2023
Fast-slow transformer for visually grounding speech P Peng, D Harwath ICASSP 2022, 2022	32	2022
Self-supervised representation learning for speech using visual grounding and masked language modeling P Peng, D Harwath AAAI 2022 SAS Workshop, 2022	28	2022
A correspondence variational autoencoder for unsupervised acoustic word embeddings P Peng, H Kamper, K Livescu NeurIPS 2020 SAS Workshop, 2020	17	2020
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild P Peng, PY Huang, D Li, A Mohamed, D Harwath arXiv preprint arXiv:2403.16973, 2024	10	2024
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models Y Tseng, L Berry, YT Chen, I Chiu, HH Lin, M Liu, P Peng, YJ Shih*, ... preprint, 2023	6	2023
Syllable segmentation and cross-lingual generalization in a visually grounded, self-supervised speech model P Peng, SW Li, AM Okko Räsänen, D Harwath Interspeech, 2023	5	2023
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model P Peng, SW Li, O Räsänen, A Mohamed, D Harwath Interspeech 2023, 2023	4	2023
BAT: Learning to Reason about Spatial Sounds with Large Language Models Z Zheng, P Peng, Z Ma, X Chen, E Choi, D Harwath arXiv preprint arXiv:2402.01591, 2024	3	2024
Audio-Visual Neural Syntax Acquisition CIJ Lai, F Shi, P Peng*, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ... ASRU 2023, 2023	3	2023
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ... arXiv preprint arXiv:2402.06959, 2024	2	2024
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos C Hori, P Peng, D Harwath, X Liu, K Ota, S Jain, R Corcodel, D Jha, ... Interspeech 2023, 2023	2	2023
Zero-shot Video Moment Retrieval With Off-the-Shelf Models A Diwan, P Peng, RJ Mooney (* denotes equal contribution) NeurIPS 2022 TL4NLP, 2022	2	2022
Textless phrase structure induction from visually-grounded speech CI Lai, F Shi, P Peng, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...	1	2023
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos C Chen, P Peng, A Baid, Z Xue, WN Hsu, D Harwarth, K Grauman arXiv preprint arXiv:2406.09272, 2024		2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath arXiv preprint arXiv:2402.05819, 2024		2024
Neural Codec Language Models for Disentangled and Textless Voice Conversion A Baade, P Peng, D Harwath

系统目前无法执行此操作，请稍后再试。

文章 1–19

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用