AboutMe: Using self-descriptions in webpages to document the effects of english pretraining data filters

L Lucy, S Gururangan, L Soldaini, E Strubell… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models'(LLMs) abilities are drawn from their pretraining data, and model
development begins with data curation. However, decisions around what data is retained or …

On the evaluation of machine-generated reports

J Mayfield, E Yang, D Lawrie, S MacAvaney… - Proceedings of the 47th …, 2024 - dl.acm.org
Large Language Models (LLMs) have enabled new ways to satisfy information needs.
Although great strides have been made in applying them to settings like document ranking …

Famus: Frames across multiple sources

S Vashishtha, A Martin, W Gantt, B Van Durme… - arXiv preprint arXiv …, 2023 - arxiv.org
Understanding event descriptions is a central aspect of language processing, but current
approaches focus overwhelmingly on single sentences or documents. Aggregating …

Report on The Search Futures Workshop at ECIR 2024

L Azzopardi, CLA Clarke, P Kantor, B Mitra… - ACM SIGIR Forum, 2024 - dl.acm.org
The First Search Futures Workshop, in conjunction with the Fourty-sixth European
Conference on Information Retrieval (ECIR) 2024, looked into the future of search to ask …

Uncovering Differences in Persuasive Language in Russian versus English Wikipedia

B Li, A Panasyuk, C Callison-Burch - arXiv preprint arXiv:2409.19148, 2024 - arxiv.org
We study how differences in persuasive language across Wikipedia articles, written in either
English and Russian, can uncover each culture's distinct perspective on different subjects …

Uncovering deep-rooted cultural differences (UNCOVER)

A Panasyuk, B Li… - Disruptive Technologies in …, 2024 - spiedigitallibrary.org
This study delves into the interconnected realms of Debates, Fake News, and Propaganda,
with an emphasis on discerning prominent ideological underpinnings distinguishing …