Aya dataset: An open-access collection for multilingual instruction tuning
S Singh, F Vargus, D Dsouza, BF Karlsson… - arXiv preprint arXiv …, 2024 - arxiv.org
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …
recent achievements in the space of natural language processing (NLP) can be attributed to …
[PDF][PDF] Qlarify: Recursively Expandable Abstracts for Directed Information Retrieval over Scientific Papers
As scientific literature has grown exponentially, researchers often rely on paper triaging
strategies such as browsing abstracts before deciding to delve into a paper's full text …
strategies such as browsing abstracts before deciding to delve into a paper's full text …
Docfinqa: A long-context financial reasoning dataset
For large language models (LLMs) to be effective in the financial domain--where each
decision can have a significant impact--it is necessary to investigate realistic tasks and data …
decision can have a significant impact--it is necessary to investigate realistic tasks and data …
Anchor-based large language models
Large language models (LLMs) predominantly employ decoder-only transformer
architectures, necessitating the retention of keys/values information for historical tokens to …
architectures, necessitating the retention of keys/values information for historical tokens to …
Docxchain: A powerful open-source toolchain for document parsing and beyond
C Yao - arXiv preprint arXiv:2310.12430, 2023 - arxiv.org
In this report, we introduce DocXChain, a powerful open-source toolchain for document
parsing, which is designed and developed to automatically convert the rich information …
parsing, which is designed and developed to automatically convert the rich information …
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
J Van Landeghem, S Maity, A Banerjee… - … on Document Analysis …, 2024 - Springer
This work explores knowledge distillation (KD) for visually-rich document (VRD) applications
such as document layout analysis (DLA) and document image classification (DIC). While …
such as document layout analysis (DLA) and document image classification (DIC). While …
TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution
Document assistant chatbots are empowered with extensive capabilities by Large Language
Models (LLMs) and have exhibited significant advancements. However, these systems may …
Models (LLMs) and have exhibited significant advancements. However, these systems may …
Qlarify: Recursively Expandable Abstracts for Dynamic Information Retrieval over Scientific Papers
Navigating the vast scientific literature often starts with browsing a paper's abstract.
However, when a reader seeks additional information, not present in the abstract, they face …
However, when a reader seeks additional information, not present in the abstract, they face …
DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing
Parsing documents from pixels, such as pictures and scanned PDFs, into hierarchical
structures is extensively demanded in the daily routines of data storage, retrieval and …
structures is extensively demanded in the daily routines of data storage, retrieval and …
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
The ability to understand and answer questions over documents can be useful in many
business and practical applications. However, documents often contain lengthy and diverse …
business and practical applications. However, documents often contain lengthy and diverse …