Large pre-trained models with extra-large vocabularies: A contrastive analysis of hebrew...

V Shalumov, H Haskey - arXiv preprint arXiv:2304.11077, 2023 - arxiv.org

In this paper, we fill in an existing gap in resources available to the Hebrew NLP community
by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained …

被引用次数：6 相关文章所有 2 个版本

[PDF] aclanthology.org

Heq: a large and diverse hebrew reading comprehension benchmark

A Cohen, H Merhav-Fine, Y Goldberg… - Findings of the …, 2023 - aclanthology.org

Abstract Current benchmarks for Hebrew Natural Language Processing (NLP) focus mainly
on morpho-syntactic tasks, neglecting the semantic dimension of language understanding …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Multilingual sequence-to-sequence models for hebrew NLP

M Eyal, H Noga, R Aharoni, I Szpektor… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent work attributes progress in NLP to large language models (LMs) with increased
model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

O Goldman, A Caciularu, M Eyal, K Cao… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite it being the cornerstone of BPE, the most common tokenization algorithm, the
importance of compression in the tokenization process is still unclear. In this paper, we …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

ivrit. ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Y Marmor, K Misgav, Y Lifshitz - arXiv preprint arXiv:2307.08720, 2023 - arxiv.org

We introduce" ivrit. ai", a comprehensive Hebrew speech dataset, addressing the distinct
lack of extensive, high-quality resources for advancing Automated Speech Recognition …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

MRL Parsing Without Tears: The Case of Hebrew

S Shmidman, A Shmidman, M Koppel… - arXiv preprint arXiv …, 2024 - arxiv.org

Syntactic parsing remains a critical tool for relation extraction and information extraction,
especially in resource-scarce languages where LLMs are lacking. Yet in morphologically …

被引用次数：2 相关文章所有 2 个版本

Requirements Engineering for LLM: The Case of Digital Inquiries Application

A Solomon, M Levy, D Agur-Cohen… - 2024 IEEE 32nd …, 2024 - ieeexplore.ieee.org

Communication between family physicians (FPs) and patients via digital inquiry systems has
become a widespread practice, often replacing in-person meetings and phone calls. Studies …

Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers

N Fono, H Moshayof, E Karol, I Assraf… - Findings of the …, 2024 - aclanthology.org

Hebrew and Aramaic inscriptions serve as an essential source of information on the ancient
history of the Near East. Unfortunately, some parts of the inscribed texts become illegible …

被引用次数：3 相关文章所有 2 个版本

[PDF] aclanthology.org

OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature

A Shmidman, S Shmidman - … of the Third Workshop on Text …, 2024 - aclanthology.org

We provide a solution for a specific morphological obstacle which often makes Hebrew
literature difficult to parse for the younger generation. The morphologically-rich nature of the …

Using ChatGPT and Other AI Engines to Vocalize Medieval Hebrew

N Gordon - Journal of Data Mining & Digital Humanities, 2024 - jdmdh.episciences.org

Hebrew is usually written without vowel points, making it challenging for some readers to
decipher. This is especially true of medieval Hebrew, which can have nonstandard grammar …