Spade: Synthesizing assertions for large language model pipelines

Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org

Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …

被引用次数：39 相关文章所有 2 个版本

[PDF] vldb.org

spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - Proceedings of the …, 2024 - dl.acm.org

Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

Z Chen, J Wang, M Xia, K Shigyo, D Liu… - … on Visualization and …, 2024 - ieeexplore.ieee.org

The integration of Large Language Models (LLMs), especially ChatGPT, into education is
poised to revolutionize students' learning experiences by introducing innovative …

被引用次数：1 相关文章所有 7 个版本

[PDF] arxiv.org

ChainBuddy: An AI Agent System for Generating LLM Pipelines

J Zhang, I Arawjo - arXiv preprint arXiv:2409.13588, 2024 - arxiv.org

As large language models (LLMs) advance, their potential applications have grown
significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation

A Szymanski, SA Gebreegziabher, O Anuyah… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are increasingly utilized for domain-specific tasks, yet
integrating domain expertise into evaluating their outputs remains challenging. A common …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

S Shankar, AG Parameswaran, E Wu - arXiv preprint arXiv:2410.12189, 2024 - arxiv.org

Analyzing unstructured data, such as complex documents, has been a persistent challenge
in data processing. Large Language Models (LLMs) have shown promise in this regard …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

RETAIN: Interactive Tool for Regression Testing Guided LLM Migration

T Dixit, D Lee, S Fang, SS Harsha, A Sureshan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are increasingly integrated into diverse applications. The
rapid evolution of LLMs presents opportunities for developers to enhance applications …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Mixing Linters with GUIs: A Color Palette Design Probe

A McNutt, MC Stone, J Heer - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visualization linters are end-user facing evaluators that automatically identify potential chart
issues. These spell-checker like systems offer a blend of interpretability and customization …

Constraint representation towards precise data-driven storytelling

YZ Shi, H Li, L Ruan, H Qu - 2024 IEEE VIS Workshop on Data …, 2024 - ieeexplore.ieee.org

Data-driven storytelling serves as a crucial bridge for communicating ideas in a persuasive
way. However, the manual creation of data stories is a multifaceted, labor-intensive, and …

被引用次数：1 相关文章所有 3 个版本

[PDF] stefan-grafberger.com

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com

Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …

被引用次数：3 相关文章所有 3 个版本