A discriminative kernel-based approach to rank images from text queries

Y Peng, X Huang, Y Zhao - … on circuits and systems for video …, 2017 - ieeexplore.ieee.org

Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly
focused on single-media retrieval. However, the requirements of users are highly flexible …

被引用次数：351 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive survey on cross-modal retrieval

K Wang, Q Yin, W Wang, S Wu, L Wang - arXiv preprint arXiv:1607.06215, 2016 - arxiv.org

In recent years, cross-modal retrieval has drawn much attention due to the rapid growth of
multimodal data. It takes one type of data as the query to retrieve relevant data of another …

被引用次数：362 相关文章所有 2 个版本

[PDF] thecvf.com

Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video

E Real, J Shlens, S Mazzocchi… - proceedings of the …, 2017 - openaccess.thecvf.com

We introduce a new large-scale data set of video URLs with densely-sampled object
bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists …

被引用次数：704 相关文章所有 11 个版本

[PDF] arxiv.org

A survey of multi-view representation learning

Y Li, M Yang, Z Zhang - IEEE transactions on knowledge and …, 2018 - ieeexplore.ieee.org

Recently, multi-view representation learning has become a rapidly growing direction in
machine learning and data mining areas. This paper introduces two categories for multi …

被引用次数：557 相关文章所有 5 个版本

[PDF] jair.org

Framing image description as a ranking task: Data, models and evaluation metrics

M Hodosh, P Young, J Hockenmaier - Journal of Artificial Intelligence …, 2013 - jair.org

The ability to associate images with natural language sentences that describe what is
depicted in them is a hallmark of image understanding, and a prerequisite for applications …

被引用次数：1554 相关文章所有 17 个版本

[PDF] jmlr.org

[PDF][PDF] Adaptive subgradient methods for online learning and stochastic optimization.

J Duchi, E Hazan, Y Singer - Journal of machine learning research, 2011 - jmlr.org

We present a new family of subgradient methods that dynamically incorporate knowledge of
the geometry of the data observed in earlier iterations to perform more informative gradient …

被引用次数：14192 相关文章所有 25 个版本

[PDF] arxiv.org

A multi-view embedding space for modeling internet images, tags, and their semantics

Y Gong, Q Ke, M Isard, S Lazebnik - International journal of computer …, 2014 - Springer

This paper investigates the problem of modeling Internet images and associated text or tags
for tasks such as image-to-image search, tag-to-image search, and image-to-tag search …

被引用次数：713 相关文章所有 19 个版本

[PDF] romisatriawahono.net

A survey of approaches and trends in person re-identification

A Bedagkar-Gala, SK Shah - Image and vision computing, 2014 - Elsevier

Person re-identification is a fundamental task in automated video surveillance and has been
an area of intense research in the past few years. Given an image/video of a person taken …

被引用次数：551 相关文章所有 11 个版本

[PDF] hal.science

Local binary patterns and its application to facial image analysis: a survey

D Huang, C Shan, M Ardabilian… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org

Local binary pattern (LBP) is a nonparametric descriptor, which efficiently summarizes the
local structures of images. In recent years, it has aroused increasing interest in many areas …

被引用次数：1234 相关文章所有 18 个版本

[PDF] arxiv.org

Predicting visual features from text for image and video caption retrieval

J Dong, X Li, CGM Snoek - IEEE Transactions on Multimedia, 2018 - ieeexplore.ieee.org

This paper strives to find amidst a set of sentences the one best describing the content of a
given image or video. Different from existing works, which rely on a joint subspace for their …

被引用次数：248 相关文章所有 9 个版本