Automatic text summarization methods: A comprehensive review

G Sharma, D Sharma - SN Computer Science, 2022 - Springer
Text summarization is the process of condensing a long text into a shorter version by
maintaining the key information and its meaning. Automatic text summarization can save …

Large language model as attributed training data generator: A tale of diversity and bias

Y Yu, Y Zhuang, J Zhang, Y Meng… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have been recently leveraged as training data generators
for various natural language processing (NLP) tasks. While previous research has explored …

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

Wild-time: A benchmark of in-the-wild distribution shift over time

H Yao, C Choi, B Cao, Y Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc
Distribution shifts occur when the test distribution differs from the training distribution, and
can considerably degrade performance of machine learning models deployed in the real …

The state of the art in creating visualization corpora for automated chart analysis

C Chen, Z Liu - Computer Graphics Forum, 2023 - Wiley Online Library
We present a state‐of‐the‐art report on visualization corpora in automated chart analysis
research. We survey 56 papers that created or used a visualization corpus as the input of …

SciCap: Generating captions for scientific figures

TY Hsu, CL Giles, THK Huang - arXiv preprint arXiv:2110.11624, 2021 - arxiv.org
Researchers use figures to communicate rich, complex information in scientific papers. The
captions of these figures are critical to conveying effective messages. However, low-quality …

Multilayer representation of collaboration networks with higher-order interactions

E Vasilyeva, A Kozlov, K Alfaro-Bittner, D Musatov… - Scientific reports, 2021 - nature.com
Collaboration patterns offer important insights into how scientific breakthroughs and
innovations emerge in small and large research groups. However, links in traditional …

Generating scientific definitions with controllable complexity

T August, K Reinecke, NA Smith - … of the 60th Annual Meeting of …, 2022 - aclanthology.org
Unfamiliar terminology and complex language can present barriers to understanding
science. Natural language processing stands to help address these issues by automatically …

Weakly-supervised scientific document classification via retrieval-augmented multi-stage training

R Xu, Y Yu, J Ho, C Yang - Proceedings of the 46th International ACM …, 2023 - dl.acm.org
Scientific document classification is a critical task for a wide range of applications, but the
cost of collecting human-labeled data can be prohibitive. We study scientific document …

ComGCN: Community-driven graph convolutional network for link prediction in dynamic networks

P Pham, LTT Nguyen, NT Nguyen… - … on Systems, Man …, 2021 - ieeexplore.ieee.org
Recent advances in deep learning have tremendously leveraged the performance of
network representation learning (NRL). Multiple deep learning-based NRL models have …