A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …
achieve satisfactory performance. However, the process of collecting and labeling such data …
NTIRE 2024 challenge on short-form UGC video quality assessment: Methods and results
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …
Depth anything: Unleashing the power of large-scale unlabeled data
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
A survey on multimodal large language models
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
Anydoor: Zero-shot object-level image customization
This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …
target objects to new scenes at user-specified locations with desired shapes. Instead of …
Eyes wide shut? exploring the visual shortcomings of multimodal llms
Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …
stem from the powerful reasoning abilities of large language models (LLMs). However the …
End-to-end autonomous driving: Challenges and frontiers
The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Towards a general-purpose foundation model for computational pathology
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …
requiring the objective characterization of histopathological entities from whole-slide images …
Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers
Recent advancements in 3D reconstruction from single images have been driven by the
evolution of generative models. Prominent among these are methods based on Score …
evolution of generative models. Prominent among these are methods based on Score …