Creating a multitrack classical music performance dataset for multimodal music analysis:...

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：162 相关文章所有 12 个版本

[PDF] nsf.gov

Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook

SS Baraheem, TN Le, TV Nguyen - Artificial Intelligence Review, 2023 - Springer

Image synthesis is a process of converting the input text, sketch, or other sources, ie, another
image or mask, into an image. It is an important problem in the computer vision field, where it …

被引用次数：12 相关文章所有 6 个版本

[PDF] arxiv.org

Auto-regressive image synthesis with integrated quantization

F Zhan, Y Yu, R Wu, J Zhang, K Cui, C Zhang… - European Conference on …, 2022 - Springer

Deep generative models have achieved conspicuous progress in realistic image synthesis
with multifarious conditional inputs, while generating diverse yet high-fidelity images …

被引用次数：52 相关文章所有 8 个版本

[PDF] thecvf.com

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com

Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

被引用次数：205 相关文章所有 9 个版本

[PDF] thecvf.com

The sound of motions

H Zhao, C Gan, WC Ma… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact
that humans is capable of interpreting sound sources from how objects move visually, we …

被引用次数：268 相关文章所有 8 个版本

[PDF] arxiv.org

Foley music: Learning to generate music from videos

C Gan, D Huang, P Chen, JB Tenenbaum… - Computer Vision–ECCV …, 2020 - Springer

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a
silent video clip about people playing musical instruments. We first identify two key …

被引用次数：128 相关文章所有 8 个版本

[PDF] arxiv.org

MT3: Multi-task multitrack music transcription

J Gardner, I Simon, E Manilow, C Hawthorne… - arXiv preprint arXiv …, 2021 - arxiv.org

Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a
challenging task at the core of music understanding. Unlike Automatic Speech Recognition …

被引用次数：75 相关文章所有 5 个版本

[PDF] arxiv.org

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org

Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

被引用次数：66 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org

An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

被引用次数：44 相关文章所有 4 个版本

[PDF] arxiv.org

Giantmidi-piano: A large-scale midi dataset for classical piano music

Q Kong, B Li, J Chen, Y Wang - arXiv preprint arXiv:2010.07061, 2020 - arxiv.org

Symbolic music datasets are important for music information retrieval and musical analysis.
However, there is a lack of large-scale symbolic datasets for classical piano music. In this …

被引用次数：86 相关文章所有 7 个版本