Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Neuro-symbolic speech understanding in aircraft maintenance metaverse

A Siyaev, GS Jo - Ieee Access, 2021 - ieeexplore.ieee.org
In the emerging world of metaverses, it is essential for speech communication systems to be
aware of context to interact with virtual assets in the 3D world. This paper proposes the …

[HTML][HTML] Deep language models for interpretative and predictive materials science

Y Hu, MJ Buehler - APL Machine Learning, 2023 - pubs.aip.org
Machine learning (ML) has emerged as an indispensable methodology to describe,
discover, and predict complex physical phenomena that efficiently help us learn underlying …

A neuro-vector-symbolic architecture for solving Raven's progressive matrices

M Hersche, M Zeqiri, L Benini, A Sebastian… - Nature Machine …, 2023 - nature.com
Neither deep neural networks nor symbolic artificial intelligence (AI) alone has approached
the kind of intelligence expressed in humans. This is mainly because neural networks are …

Learning to predict visual attributes in the wild

K Pham, K Kafle, Z Lin, Z Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com
Visual attributes constitute a large portion of information contained in a scene. Objects can
be described using a wide variety of attributes which portray their visual appearance (color …

Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

Concept decomposition for visual exploration and inspiration

Y Vinker, A Voynov, D Cohen-Or, A Shamir - ACM Transactions on …, 2023 - dl.acm.org
A creative idea is often born from transforming, combining, and modifying ideas from existing
visual examples capturing various concepts. However, one cannot simply copy the concept …

Ns3d: Neuro-symbolic grounding of 3d objects and relations

J Hsu, J Mao, J Wu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Grounding object properties and relations in 3D scenes is a prerequisite for a wide range of
artificial intelligence tasks, such as visually grounded dialogues and embodied …

What's left? concept grounding with logic-enhanced foundation models

J Hsu, J Mao, J Tenenbaum… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent works such as VisProg and ViperGPT have smartly composed foundation models for
visual reasoning—using large language models (LLMs) to produce programs that can be …

Abstract spatial-temporal reasoning via probabilistic abduction and execution

C Zhang, B Jia, SC Zhu, Y Zhu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Spatial-temporal reasoning is a challenging task in Artificial Intelligence (AI) due to its
demanding but unique nature: a theoretic requirement on representing and reasoning …