Megawika: Millions of reports and their sources across 50 diverse languages

AboutMe: Using self-descriptions in webpages to document the effects of english pretraining data filters

L Lucy, S Gururangan, L Soldaini, E Strubell… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models'(LLMs) abilities are drawn from their pretraining data, and model
development begins with data curation. However, decisions around what data is retained or …

被引用次数：11 相关文章所有 4 个版本

[PDF] acm.org

On the evaluation of machine-generated reports

J Mayfield, E Yang, D Lawrie, S MacAvaney… - Proceedings of the 47th …, 2024 - dl.acm.org

Large Language Models (LLMs) have enabled new ways to satisfy information needs.
Although great strides have been made in applying them to settings like document ranking …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Famus: Frames across multiple sources

S Vashishtha, A Martin, W Gantt, B Van Durme… - arXiv preprint arXiv …, 2023 - arxiv.org

Understanding event descriptions is a central aspect of language processing, but current
approaches focus overwhelmingly on single sentences or documents. Aggregating …

被引用次数：2 相关文章所有 3 个版本

[PDF] ohsu.edu

Report on The Search Futures Workshop at ECIR 2024

L Azzopardi, CLA Clarke, P Kantor, B Mitra… - ACM SIGIR Forum, 2024 - dl.acm.org

The First Search Futures Workshop, in conjunction with the Fourty-sixth European
Conference on Information Retrieval (ECIR) 2024, looked into the future of search to ask …

被引用次数：1 相关文章所有 8 个版本

[PDF] arxiv.org

Uncovering Differences in Persuasive Language in Russian versus English Wikipedia

B Li, A Panasyuk, C Callison-Burch - arXiv preprint arXiv:2409.19148, 2024 - arxiv.org

We study how differences in persuasive language across Wikipedia articles, written in either
English and Russian, can uncover each culture's distinct perspective on different subjects …

Uncovering deep-rooted cultural differences (UNCOVER)

A Panasyuk, B Li… - Disruptive Technologies in …, 2024 - spiedigitallibrary.org

This study delves into the interconnected realms of Debates, Fake News, and Propaganda,
with an emphasis on discerning prominent ideological underpinnings distinguishing …