Pdftriage: Question answering over long, structured documents

S Singh, F Vargus, D Dsouza, BF Karlsson… - arXiv preprint arXiv …, 2024 - arxiv.org

Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …

被引用次数：60 相关文章所有 2 个版本

[PDF] github.io

[PDF][PDF] Qlarify: Recursively Expandable Abstracts for Directed Information Retrieval over Scientific Papers

R Fok, JC Chang, T August, AX Zhang… - arXiv preprint arXiv …, 2023 - talaugust.github.io

As scientific literature has grown exponentially, researchers often rely on paper triaging
strategies such as browsing abstracts before deciding to delve into a paper's full text …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Docfinqa: A long-context financial reasoning dataset

V Reddy, R Koncel-Kedziorski, VD Lai… - arXiv preprint arXiv …, 2024 - arxiv.org

For large language models (LLMs) to be effective in the financial domain--where each
decision can have a significant impact--it is necessary to investigate realistic tasks and data …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Anchor-based large language models

J Pang, F Ye, DF Wong, X He, W Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) predominantly employ decoder-only transformer
architectures, necessitating the retention of keys/values information for historical tokens to …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Docxchain: A powerful open-source toolchain for document parsing and beyond

C Yao - arXiv preprint arXiv:2310.12430, 2023 - arxiv.org

In this report, we introduce DocXChain, a powerful open-source toolchain for document
parsing, which is designed and developed to automatically convert the rich information …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

`DistilDoc`: Knowledge Distillation for Visually-Rich Document Applications

J Van Landeghem, S Maity, A Banerjee… - … on Document Analysis …, 2024 - Springer

This work explores knowledge distillation (KD) for visually-rich document (VRD) applications
such as document layout analysis (DLA) and document image classification (DIC). While …

被引用次数：1 相关文章所有 4 个版本

[PDF] aclanthology.org

TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution

D Li, X Hu, Z Sun, B Hu, S Ye, Z Shan… - Proceedings of the …, 2024 - aclanthology.org

Document assistant chatbots are empowered with extensive capabilities by Large Language
Models (LLMs) and have exhibited significant advancements. However, these systems may …

[PDF] acm.org

Qlarify: Recursively Expandable Abstracts for Dynamic Information Retrieval over Scientific Papers

R Fok, JC Chang, T August, AX Zhang… - Proceedings of the 37th …, 2024 - dl.acm.org

Navigating the vast scientific literature often starts with browsing a paper's abstract.
However, when a reader seeks additional information, not present in the abstract, they face …

DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing

H Xing, C Cheng, F Gao, Z Shao, Z Yu… - Proceedings of the …, 2024 - aclanthology.org

Parsing documents from pixels, such as pictures and scanned PDFs, into hierarchical
structures is extensively demanded in the daily routines of data storage, retrieval and …

[PDF] arxiv.org

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

YK Chia, L Cheng, HP Chan, C Liu, M Song… - arXiv preprint arXiv …, 2024 - arxiv.org

The ability to understand and answer questions over documents can be useful in many
business and practical applications. However, documents often contain lengthy and diverse …