Hero: Roberta and longformer hebrew language models
V Shalumov, H Haskey - arXiv preprint arXiv:2304.11077, 2023 - arxiv.org
In this paper, we fill in an existing gap in resources available to the Hebrew NLP community
by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained …
by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained …
Heq: a large and diverse hebrew reading comprehension benchmark
A Cohen, H Merhav-Fine, Y Goldberg… - Findings of the …, 2023 - aclanthology.org
Abstract Current benchmarks for Hebrew Natural Language Processing (NLP) focus mainly
on morpho-syntactic tasks, neglecting the semantic dimension of language understanding …
on morpho-syntactic tasks, neglecting the semantic dimension of language understanding …
Multilingual sequence-to-sequence models for hebrew NLP
Recent work attributes progress in NLP to large language models (LMs) with increased
model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs …
model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs …
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Despite it being the cornerstone of BPE, the most common tokenization algorithm, the
importance of compression in the tokenization process is still unclear. In this paper, we …
importance of compression in the tokenization process is still unclear. In this paper, we …
ivrit. ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Y Marmor, K Misgav, Y Lifshitz - arXiv preprint arXiv:2307.08720, 2023 - arxiv.org
We introduce" ivrit. ai", a comprehensive Hebrew speech dataset, addressing the distinct
lack of extensive, high-quality resources for advancing Automated Speech Recognition …
lack of extensive, high-quality resources for advancing Automated Speech Recognition …
MRL Parsing Without Tears: The Case of Hebrew
S Shmidman, A Shmidman, M Koppel… - arXiv preprint arXiv …, 2024 - arxiv.org
Syntactic parsing remains a critical tool for relation extraction and information extraction,
especially in resource-scarce languages where LLMs are lacking. Yet in morphologically …
especially in resource-scarce languages where LLMs are lacking. Yet in morphologically …
Requirements Engineering for LLM: The Case of Digital Inquiries Application
Communication between family physicians (FPs) and patients via digital inquiry systems has
become a widespread practice, often replacing in-person meetings and phone calls. Studies …
become a widespread practice, often replacing in-person meetings and phone calls. Studies …
Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers
N Fono, H Moshayof, E Karol, I Assraf… - Findings of the …, 2024 - aclanthology.org
Hebrew and Aramaic inscriptions serve as an essential source of information on the ancient
history of the Near East. Unfortunately, some parts of the inscribed texts become illegible …
history of the Near East. Unfortunately, some parts of the inscribed texts become illegible …
OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature
A Shmidman, S Shmidman - … of the Third Workshop on Text …, 2024 - aclanthology.org
We provide a solution for a specific morphological obstacle which often makes Hebrew
literature difficult to parse for the younger generation. The morphologically-rich nature of the …
literature difficult to parse for the younger generation. The morphologically-rich nature of the …
Using ChatGPT and Other AI Engines to Vocalize Medieval Hebrew
N Gordon - Journal of Data Mining & Digital Humanities, 2024 - jdmdh.episciences.org
Hebrew is usually written without vowel points, making it challenging for some readers to
decipher. This is especially true of medieval Hebrew, which can have nonstandard grammar …
decipher. This is especially true of medieval Hebrew, which can have nonstandard grammar …