- 学术资源搜索

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

被引用次数：56 相关文章所有 3 个版本

[PDF] thecvf.com

NTIRE 2024 challenge on short-form UGC video quality assessment: Methods and results

X Li, K Yuan, Y Pei, Y Lu, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …

被引用次数：21 相关文章所有 3 个版本

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

被引用次数：209 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

被引用次数：656 相关文章所有 6 个版本

[PDF] thecvf.com

Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

被引用次数：117 相关文章所有 3 个版本

[PDF] thecvf.com

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

被引用次数：89 相关文章所有 4 个版本

[PDF] ieee.org

End-to-end autonomous driving: Challenges and frontiers

L Chen, P Wu, K Chitta, B Jaeger… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …

被引用次数：128 相关文章所有 4 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：132 相关文章所有 6 个版本

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

被引用次数：117 相关文章所有 3 个版本

[PDF] thecvf.com

Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

ZX Zou, Z Yu, YC Guo, Y Li, D Liang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in 3D reconstruction from single images have been driven by the
evolution of generative models. Prominent among these are methods based on Score …

被引用次数：64 相关文章所有 3 个版本