An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges

Y Peng, X Huang, Y Zhao - … on circuits and systems for video …, 2017 - ieeexplore.ieee.org
Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly
focused on single-media retrieval. However, the requirements of users are highly flexible …

A comprehensive survey on cross-modal retrieval

K Wang, Q Yin, W Wang, S Wu, L Wang - arXiv preprint arXiv:1607.06215, 2016 - arxiv.org
In recent years, cross-modal retrieval has drawn much attention due to the rapid growth of
multimodal data. It takes one type of data as the query to retrieve relevant data of another …

Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video

E Real, J Shlens, S Mazzocchi… - proceedings of the …, 2017 - openaccess.thecvf.com
We introduce a new large-scale data set of video URLs with densely-sampled object
bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists …

A survey of multi-view representation learning

Y Li, M Yang, Z Zhang - IEEE transactions on knowledge and …, 2018 - ieeexplore.ieee.org
Recently, multi-view representation learning has become a rapidly growing direction in
machine learning and data mining areas. This paper introduces two categories for multi …

Framing image description as a ranking task: Data, models and evaluation metrics

M Hodosh, P Young, J Hockenmaier - Journal of Artificial Intelligence …, 2013 - jair.org
The ability to associate images with natural language sentences that describe what is
depicted in them is a hallmark of image understanding, and a prerequisite for applications …

[PDF][PDF] Adaptive subgradient methods for online learning and stochastic optimization.

J Duchi, E Hazan, Y Singer - Journal of machine learning research, 2011 - jmlr.org
We present a new family of subgradient methods that dynamically incorporate knowledge of
the geometry of the data observed in earlier iterations to perform more informative gradient …

A multi-view embedding space for modeling internet images, tags, and their semantics

Y Gong, Q Ke, M Isard, S Lazebnik - International journal of computer …, 2014 - Springer
This paper investigates the problem of modeling Internet images and associated text or tags
for tasks such as image-to-image search, tag-to-image search, and image-to-tag search …

A survey of approaches and trends in person re-identification

A Bedagkar-Gala, SK Shah - Image and vision computing, 2014 - Elsevier
Person re-identification is a fundamental task in automated video surveillance and has been
an area of intense research in the past few years. Given an image/video of a person taken …

Local binary patterns and its application to facial image analysis: a survey

D Huang, C Shan, M Ardabilian… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Local binary pattern (LBP) is a nonparametric descriptor, which efficiently summarizes the
local structures of images. In recent years, it has aroused increasing interest in many areas …

Predicting visual features from text for image and video caption retrieval

J Dong, X Li, CGM Snoek - IEEE Transactions on Multimedia, 2018 - ieeexplore.ieee.org
This paper strives to find amidst a set of sentences the one best describing the content of a
given image or video. Different from existing works, which rely on a joint subspace for their …