A review on multimodal zero‐shot learning

W Cao, Y Wu, Y Sun, H Zhang, J Ren… - … : Data Mining and …, 2023 - Wiley Online Library
Multimodal learning provides a path to fully utilize all types of information related to the
modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a …

System transparency in shared autonomy: A mini review

V Alonso, P De La Puente - Frontiers in neurorobotics, 2018 - frontiersin.org
What does transparency mean in a shared autonomy framework? Different ways of
understanding system transparency in human-robot interaction can be found in the state of …

A survey on deep reinforcement learning for audio-based applications

S Latif, H Cuayáhuitl, F Pervez, F Shamshad… - Artificial Intelligence …, 2023 - Springer
Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence
(AI) by endowing autonomous systems with high levels of understanding of the real world …

SAC: Semantic attention composition for text-conditioned image retrieval

S Jandial, P Badjatiya, P Chawla… - Proceedings of the …, 2022 - openaccess.thecvf.com
The ability to efficiently search for images is essential for improving the user experiences
across various products. Incorporating user feedback, via multi-modal inputs, to navigate …

[HTML][HTML] Coarse-to-fine fusion for language grounding in 3D navigation

TT Nguyen, AH Vo, SM Choi, YG Kim - Knowledge-Based Systems, 2023 - Elsevier
We present a new network whereby an agent navigates in the 3D environment to find a
target object according to a language-based instruction. Such a task is challenging because …

[PDF][PDF] Trace: Transform aggregate and compose visiolinguistic representations for image search with text feedback

S Jandial, A Chopra, P Badjatiya… - arXiv preprint arXiv …, 2020 - researchgate.net
The ability to efficiently search for images over an indexed database is the cornerstone for
several user experiences. Incorporating user feedback, through multi-modal inputs provide …

Multi-modal association based grouping for form structure extraction

M Aggarwal, M Sarkar, H Gupta… - Proceedings of the …, 2020 - openaccess.thecvf.com
Document structure extraction has been a widely researched area for decades. Recent work
in this direction has been deep learning-based, mostly focusing on extracting structure using …

[PDF][PDF] Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition.

M Zhang, T Tanaka, W Hou, S Gao, T Shinozaki - Interspeech, 2020 - interspeech2020.org
The process of spoken language acquisition based on soundimage grounding has been
one of the topics that has attracted the most significant interest of linguists and human …

Spoken language acquisition based on reinforcement learning and word unit segmentation

S Gao, W Hou, T Tanaka… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
The process of spoken-language acquisition has been one of the topics of greatest interest
to linguists for decades. By uti-lizing modern machine learning techniques, we simulated this …

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

K Ajayi, X Wei, M Gryder, W Shields, J Wu, SM Jones… - Scientific Data, 2023 - nature.com
Recent advances in computer vision (CV) and natural language processing have been
driven by exploiting big data on practical applications. However, these research fields are …