Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Fine-grained image analysis with deep learning: A survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …
vision and pattern recognition, and underpins a diverse set of real-world applications. The …
Three-stream attention-aware network for RGB-D salient object detection
Previous RGB-D fusion systems based on convolutional neural networks typically employ a
two-stream architecture, in which RGB and depth inputs are learned independently. The …
two-stream architecture, in which RGB and depth inputs are learned independently. The …
Attribute-aware deep hashing with self-consistency for large-scale fine-grained image retrieval
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images
depicting the concept of interests (ie, the same sub-category labels) highest based on the …
depicting the concept of interests (ie, the same sub-category labels) highest based on the …
A-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images
depicting the concept of interests (ie, the same sub-category labels) highest based on the …
depicting the concept of interests (ie, the same sub-category labels) highest based on the …
Food and ingredient joint learning for fine-grained recognition
Fine-grained food recognition is the detailed classification that provides more specialized
and professional attribute information of food. It is the basic work to realize healthy diet …
and professional attribute information of food. It is the basic work to realize healthy diet …
Online latent semantic hashing for cross-media retrieval
Hashing based cross-media method has been become an increasingly popular technique in
facilitating large-scale multimedia retrieval task, owing to its effectiveness and efficiency …
facilitating large-scale multimedia retrieval task, owing to its effectiveness and efficiency …
Multi-modal attribute prompting for vision-language models
Pre-trained Vision-Language Models (VLMs), like CLIP, exhibit strong generalization ability
to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques …
to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques …
TMFNet: Three-input multilevel fusion network for detecting salient objects in RGB-D images
The use of depth information, acquired by depth sensors, for salient object detection (SOD)
is being explored. Despite the remarkable results from recent deep learning approaches for …
is being explored. Despite the remarkable results from recent deep learning approaches for …
Robust learning from noisy web data for fine-grained recognition
Due to DNNs' memorization effect, label noise lessens the performance of the web-
supervised fine-grained visual categorization task. Previous literature primarily relies on …
supervised fine-grained visual categorization task. Previous literature primarily relies on …