Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Fine-grained image analysis with deep learning: A survey

XS Wei, YZ Song, O Mac Aodha, J Wu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …

Three-stream attention-aware network for RGB-D salient object detection

H Chen, Y Li - IEEE Transactions on Image Processing, 2019 - ieeexplore.ieee.org
Previous RGB-D fusion systems based on convolutional neural networks typically employ a
two-stream architecture, in which RGB and depth inputs are learned independently. The …

Attribute-aware deep hashing with self-consistency for large-scale fine-grained image retrieval

XS Wei, Y Shen, X Sun, P Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images
depicting the concept of interests (ie, the same sub-category labels) highest based on the …

A-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval

XS Wei, Y Shen, X Sun, HJ Ye… - Advances in Neural …, 2021 - proceedings.neurips.cc
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images
depicting the concept of interests (ie, the same sub-category labels) highest based on the …

Food and ingredient joint learning for fine-grained recognition

C Liu, Y Liang, Y Xue, X Qian… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Fine-grained food recognition is the detailed classification that provides more specialized
and professional attribute information of food. It is the basic work to realize healthy diet …

Online latent semantic hashing for cross-media retrieval

T Yao, G Wang, L Yan, X Kong, Q Su, C Zhang… - Pattern Recognition, 2019 - Elsevier
Hashing based cross-media method has been become an increasingly popular technique in
facilitating large-scale multimedia retrieval task, owing to its effectiveness and efficiency …

Multi-modal attribute prompting for vision-language models

X Liu, J Wu, W Yang, X Zhou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Pre-trained Vision-Language Models (VLMs), like CLIP, exhibit strong generalization ability
to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques …

TMFNet: Three-input multilevel fusion network for detecting salient objects in RGB-D images

W Zhou, S Pan, J Lei, L Yu - IEEE Transactions on Emerging …, 2021 - ieeexplore.ieee.org
The use of depth information, acquired by depth sensors, for salient object detection (SOD)
is being explored. Despite the remarkable results from recent deep learning approaches for …

Robust learning from noisy web data for fine-grained recognition

Z Cai, GS Xie, X Huang, D Huang, Y Yao, Z Tang - Pattern Recognition, 2023 - Elsevier
Due to DNNs' memorization effect, label noise lessens the performance of the web-
supervised fine-grained visual categorization task. Previous literature primarily relies on …