Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org
Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …

spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - Proceedings of the …, 2024 - dl.acm.org
Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …

StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

Z Chen, J Wang, M Xia, K Shigyo, D Liu… - … on Visualization and …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs), especially ChatGPT, into education is
poised to revolutionize students' learning experiences by introducing innovative …

ChainBuddy: An AI Agent System for Generating LLM Pipelines

J Zhang, I Arawjo - arXiv preprint arXiv:2409.13588, 2024 - arxiv.org
As large language models (LLMs) advance, their potential applications have grown
significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …

Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation

A Szymanski, SA Gebreegziabher, O Anuyah… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are increasingly utilized for domain-specific tasks, yet
integrating domain expertise into evaluating their outputs remains challenging. A common …

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

S Shankar, AG Parameswaran, E Wu - arXiv preprint arXiv:2410.12189, 2024 - arxiv.org
Analyzing unstructured data, such as complex documents, has been a persistent challenge
in data processing. Large Language Models (LLMs) have shown promise in this regard …

RETAIN: Interactive Tool for Regression Testing Guided LLM Migration

T Dixit, D Lee, S Fang, SS Harsha, A Sureshan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are increasingly integrated into diverse applications. The
rapid evolution of LLMs presents opportunities for developers to enhance applications …

Mixing Linters with GUIs: A Color Palette Design Probe

A McNutt, MC Stone, J Heer - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visualization linters are end-user facing evaluators that automatically identify potential chart
issues. These spell-checker like systems offer a blend of interpretability and customization …

Constraint representation towards precise data-driven storytelling

YZ Shi, H Li, L Ruan, H Qu - 2024 IEEE VIS Workshop on Data …, 2024 - ieeexplore.ieee.org
Data-driven storytelling serves as a crucial bridge for communicating ideas in a persuasive
way. However, the manual creation of data stories is a multifaceted, labor-intensive, and …

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com
Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …