Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences
S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org
Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines
Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …
repeatedly process or generate data of some sort. However, a common barrier to …
StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions
Z Chen, J Wang, M Xia, K Shigyo, D Liu… - … on Visualization and …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs), especially ChatGPT, into education is
poised to revolutionize students' learning experiences by introducing innovative …
poised to revolutionize students' learning experiences by introducing innovative …
ChainBuddy: An AI Agent System for Generating LLM Pipelines
J Zhang, I Arawjo - arXiv preprint arXiv:2409.13588, 2024 - arxiv.org
As large language models (LLMs) advance, their potential applications have grown
significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …
significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …
Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation
Large Language Models (LLMs) are increasingly utilized for domain-specific tasks, yet
integrating domain expertise into evaluating their outputs remains challenging. A common …
integrating domain expertise into evaluating their outputs remains challenging. A common …
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Analyzing unstructured data, such as complex documents, has been a persistent challenge
in data processing. Large Language Models (LLMs) have shown promise in this regard …
in data processing. Large Language Models (LLMs) have shown promise in this regard …
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration
Large Language Models (LLMs) are increasingly integrated into diverse applications. The
rapid evolution of LLMs presents opportunities for developers to enhance applications …
rapid evolution of LLMs presents opportunities for developers to enhance applications …
Mixing Linters with GUIs: A Color Palette Design Probe
Visualization linters are end-user facing evaluators that automatically identify potential chart
issues. These spell-checker like systems offer a blend of interpretability and customization …
issues. These spell-checker like systems offer a blend of interpretability and customization …
Constraint representation towards precise data-driven storytelling
Data-driven storytelling serves as a crucial bridge for communicating ideas in a persuasive
way. However, the manual creation of data stories is a multifaceted, labor-intensive, and …
way. However, the manual creation of data stories is a multifaceted, labor-intensive, and …
[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.
S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com
Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …