Automatic text summarization methods: A comprehensive review
Text summarization is the process of condensing a long text into a shorter version by
maintaining the key information and its meaning. Automatic text summarization can save …
maintaining the key information and its meaning. Automatic text summarization can save …
Large language model as attributed training data generator: A tale of diversity and bias
Large language models (LLMs) have been recently leveraged as training data generators
for various natural language processing (NLP) tasks. While previous research has explored …
for various natural language processing (NLP) tasks. While previous research has explored …
The bigscience roots corpus: A 1.6 tb composite multilingual dataset
H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …
Wild-time: A benchmark of in-the-wild distribution shift over time
Distribution shifts occur when the test distribution differs from the training distribution, and
can considerably degrade performance of machine learning models deployed in the real …
can considerably degrade performance of machine learning models deployed in the real …
The state of the art in creating visualization corpora for automated chart analysis
We present a state‐of‐the‐art report on visualization corpora in automated chart analysis
research. We survey 56 papers that created or used a visualization corpus as the input of …
research. We survey 56 papers that created or used a visualization corpus as the input of …
SciCap: Generating captions for scientific figures
Researchers use figures to communicate rich, complex information in scientific papers. The
captions of these figures are critical to conveying effective messages. However, low-quality …
captions of these figures are critical to conveying effective messages. However, low-quality …
Multilayer representation of collaboration networks with higher-order interactions
Collaboration patterns offer important insights into how scientific breakthroughs and
innovations emerge in small and large research groups. However, links in traditional …
innovations emerge in small and large research groups. However, links in traditional …
Generating scientific definitions with controllable complexity
Unfamiliar terminology and complex language can present barriers to understanding
science. Natural language processing stands to help address these issues by automatically …
science. Natural language processing stands to help address these issues by automatically …
Weakly-supervised scientific document classification via retrieval-augmented multi-stage training
Scientific document classification is a critical task for a wide range of applications, but the
cost of collecting human-labeled data can be prohibitive. We study scientific document …
cost of collecting human-labeled data can be prohibitive. We study scientific document …
ComGCN: Community-driven graph convolutional network for link prediction in dynamic networks
Recent advances in deep learning have tremendously leveraged the performance of
network representation learning (NRL). Multiple deep learning-based NRL models have …
network representation learning (NRL). Multiple deep learning-based NRL models have …