From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
A survey on multimodal large language models
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
Attention on attention for image captioning
Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …
captioning, where a weighted average on encoded vectors is generated at each time step to …
Deep learning for text style transfer: A survey
Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …
certain attributes in the generated text, such as politeness, emotion, humor, and many …
The design and implementation of xiaoice, an empathetic social chatbot
This article describes the development of Microsoft XiaoIce, the most popular social chatbot
in the world. XiaoIce is uniquely designed as an artifical intelligence companion with an …
in the world. XiaoIce is uniquely designed as an artifical intelligence companion with an …
Multimodal intelligence: Representation learning, information fusion, and applications
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …
natural language processing since 2010. Each of these tasks involves a single modality in …
Controlled text generation with natural language instructions
Large language models can be prompted to pro-duce fluent output for a wide range of tasks
without being specifically trained to do so. Nevertheless, it is notoriously difficult to control …
without being specifically trained to do so. Nevertheless, it is notoriously difficult to control …
From Eliza to XiaoIce: challenges and opportunities with social chatbots
Conversational systems have come a long way since their inception in the 1960s. After
decades of research and development, we have seen progress from Eliza and Parry in the …
decades of research and development, we have seen progress from Eliza and Parry in the …
Delete, retrieve, generate: a simple approach to sentiment and style transfer
We consider the task of text attribute transfer: transforming a sentence to alter a specific
attribute (eg, sentiment) while preserving its attribute-independent content (eg, changing" …
attribute (eg, sentiment) while preserving its attribute-independent content (eg, changing" …