Os-copilot: Towards generalist computer agents with self-improvement

Z Wu, C Han, Z Ding, Z Weng, Z Liu, S Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous interaction with the computer has been a longstanding challenge with great
potential, and the recent proliferation of large language models (LLMs) has markedly …

Mobile-bench: An evaluation benchmark for llm-based mobile agents

S Deng, W Xu, H Sun, W Liu, T Tan, J Liu, A Li… - arXiv preprint arXiv …, 2024 - arxiv.org
With the remarkable advancements of large language models (LLMs), LLM-based agents
have become a research hotspot in human-computer interaction. However, there is a …

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

Z Zhang, Y Guo, Y Liang, D Zhao, N Duan - arXiv preprint arXiv …, 2024 - arxiv.org
The growing dependence on Large Language Models (LLMs) for finishing user instructions
necessitates a comprehensive understanding of their robustness to complex task completion …

Large Language Models Can Learn Representation in Natural Language

Y Guo, Y Liang, D Zhao, N Duan - Findings of the Association for …, 2024 - aclanthology.org
One major challenge for Large Language Models (LLMs) is completing complex tasks
involving multiple entities, such as tool APIs. To tackle this, one approach is to retrieve …

[HTML][HTML] Evaluation of topic continuity using nonlinearlized naive bayes with attention mechanism

ST Pi, P Bagavan, Y Li, Q Liu - 2024 - amazon.science
Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often
presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor …

[PDF][PDF] Towards Assisted Excellence: Designing an AI-Based System for Presentation Slide Evaluation

I Blohm - alexandria.unisg.ch
Creating and disseminating high-quality presentation slides have become a foundation for
effective communication in educational, corporate, and scientific domains. This study …