Attention based natural language grounding by navigating virtual environment

W Cao, Y Wu, Y Sun, H Zhang, J Ren… - … : Data Mining and …, 2023 - Wiley Online Library

Multimodal learning provides a path to fully utilize all types of information related to the
modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a …

被引用次数：29 相关文章所有 3 个版本

[PDF] frontiersin.org

System transparency in shared autonomy: A mini review

V Alonso, P De La Puente - Frontiers in neurorobotics, 2018 - frontiersin.org

What does transparency mean in a shared autonomy framework? Different ways of
understanding system transparency in human-robot interaction can be found in the state of …

被引用次数：112 相关文章所有 7 个版本

[PDF] springer.com

A survey on deep reinforcement learning for audio-based applications

S Latif, H Cuayáhuitl, F Pervez, F Shamshad… - Artificial Intelligence …, 2023 - Springer

Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence
(AI) by endowing autonomous systems with high levels of understanding of the real world …

被引用次数：90 相关文章所有 10 个版本

[PDF] thecvf.com

SAC: Semantic attention composition for text-conditioned image retrieval

S Jandial, P Badjatiya, P Chawla… - Proceedings of the …, 2022 - openaccess.thecvf.com

The ability to efficiently search for images is essential for improving the user experiences
across various products. Incorporating user feedback, via multi-modal inputs, to navigate …

被引用次数：45 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Coarse-to-fine fusion for language grounding in 3D navigation

TT Nguyen, AH Vo, SM Choi, YG Kim - Knowledge-Based Systems, 2023 - Elsevier

We present a new network whereby an agent navigates in the 3D environment to find a
target object according to a language-based instruction. Such a task is challenging because …

被引用次数：2 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Trace: Transform aggregate and compose visiolinguistic representations for image search with text feedback

S Jandial, A Chopra, P Badjatiya… - arXiv preprint arXiv …, 2020 - researchgate.net

The ability to efficiently search for images over an indexed database is the cornerstone for
several user experiences. Incorporating user feedback, through multi-modal inputs provide …

被引用次数：21 相关文章

[PDF] thecvf.com

Multi-modal association based grouping for form structure extraction

M Aggarwal, M Sarkar, H Gupta… - Proceedings of the …, 2020 - openaccess.thecvf.com

Document structure extraction has been a widely researched area for decades. Recent work
in this direction has been deep learning-based, mostly focusing on extracting structure using …

被引用次数：16 相关文章所有 5 个版本

[PDF] interspeech2020.org

[PDF][PDF] Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition.

M Zhang, T Tanaka, W Hou, S Gao, T Shinozaki - Interspeech, 2020 - interspeech2020.org

The process of spoken language acquisition based on soundimage grounding has been
one of the topics that has attracted the most significant interest of linguists and human …

被引用次数：11 相关文章所有 6 个版本

[PDF] sigport.org

Spoken language acquisition based on reinforcement learning and word unit segmentation

S Gao, W Hou, T Tanaka… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

The process of spoken-language acquisition has been one of the topics of greatest interest
to linguists for decades. By uti-lizing modern machine learning techniques, we simulated this …

被引用次数：13 相关文章所有 5 个版本

[PDF] nature.com

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

K Ajayi, X Wei, M Gryder, W Shields, J Wu, SM Jones… - Scientific Data, 2023 - nature.com

Recent advances in computer vision (CV) and natural language processing have been
driven by exploiting big data on practical applications. However, these research fields are …

被引用次数：3 相关文章所有 13 个版本