From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Attention on attention for image captioning

L Huang, W Wang, J Chen… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …

Deep learning for text style transfer: A survey

D Jin, Z Jin, Z Hu, O Vechtomova… - Computational …, 2022 - direct.mit.edu
Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …

The design and implementation of xiaoice, an empathetic social chatbot

L Zhou, J Gao, D Li, HY Shum - Computational Linguistics, 2020 - direct.mit.edu
This article describes the development of Microsoft XiaoIce, the most popular social chatbot
in the world. XiaoIce is uniquely designed as an artifical intelligence companion with an …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Controlled text generation with natural language instructions

W Zhou, YE Jiang, E Wilcox… - International …, 2023 - proceedings.mlr.press
Large language models can be prompted to pro-duce fluent output for a wide range of tasks
without being specifically trained to do so. Nevertheless, it is notoriously difficult to control …

From Eliza to XiaoIce: challenges and opportunities with social chatbots

HY Shum, X He, D Li - Frontiers of Information Technology & Electronic …, 2018 - Springer
Conversational systems have come a long way since their inception in the 1960s. After
decades of research and development, we have seen progress from Eliza and Parry in the …

Delete, retrieve, generate: a simple approach to sentiment and style transfer

J Li, R Jia, H He, P Liang - arXiv preprint arXiv:1804.06437, 2018 - arxiv.org
We consider the task of text attribute transfer: transforming a sentence to alter a specific
attribute (eg, sentiment) while preserving its attribute-independent content (eg, changing" …