Foundations and recent trends in multimodal mobile agents: A survey
Mobile agents are essential for automating tasks in complex and dynamic mobile
environments. As foundation models evolve, the demands for agents that can adapt in real …
environments. As foundation models evolve, the demands for agents that can adapt in real …
Ferret-ui 2: Mastering universal user interface understanding across platforms
Building a generalist model for user interface (UI) understanding is challenging due to
various foundational issues, such as platform diversity, resolution variation, and data …
various foundational issues, such as platform diversity, resolution variation, and data …
Mindsearch: Mimicking human minds elicits deep ai searcher
Information seeking and integration is a complex cognitive task that consumes enormous
time and effort. Inspired by the remarkable progress of Large Language Models, recent …
time and effort. Inspired by the remarkable progress of Large Language Models, recent …
Caution for the environment: Multimodal agents are susceptible to environmental distractions
This paper investigates the faithfulness of multimodal large language model (MLLM) agents
in the graphical user interface (GUI) environment, aiming to address the research question …
in the graphical user interface (GUI) environment, aiming to address the research question …
Distrl: An asynchronous distributed reinforcement learning framework for on-device control agents
On-device control agents, especially on mobile devices, are responsible for operating
mobile devices to fulfill users' requests, enabling seamless and intuitive interactions …
mobile devices to fulfill users' requests, enabling seamless and intuitive interactions …
Agent-e: From autonomous web navigation to foundational design principles in agentic systems
AI Agents are changing the way work gets done, both in consumer and enterprise domains.
However, the design patterns and architectures to build highly capable agents or multi-agent …
However, the design patterns and architectures to build highly capable agents or multi-agent …
Do multimodal foundation models understand enterprise workflows? a benchmark for business process management tasks
Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating
models on business process management (BPM) tasks. BPM is the practice of documenting …
models on business process management (BPM) tasks. BPM is the practice of documenting …
Autoglm: Autonomous foundation agents for guis
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …
Towards a science exocortex
KG Yager - Digital Discovery, 2024 - pubs.rsc.org
Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with
generative AI enabling automation of text analysis, text generation, and simple decision …
generative AI enabling automation of text analysis, text generation, and simple decision …
Lightweight Neural App Control
This paper introduces a novel mobile phone control architecture, termed``app agents", for
efficient interactions and controls across various Android apps. The proposed Lightweight …
efficient interactions and controls across various Android apps. The proposed Lightweight …