The influence of preprocessing on text classification using a bag-of-words representation

Y HaCohen-Kerner, D Miller, Y Yigal - PloS one, 2020 - journals.plos.org
Text classification (TC) is the task of automatically assigning documents to a fixed number of
categories. TC is an important component in many text applications. Many of these …

[PDF][PDF] Sexism identification in social networks using TF-IDF embeddings, preproccessing, feature selection, word/Char N-grams and various machine learning models …

R Keinan - Working Notes of CLEF, 2024 - ceur-ws.org
In this paper, we describe our submission to the EXIST-2024 contest. We tackled Task 1-
“Sexism Identification in tweets" in English and Spanish. To classify the tweets as texts …

JCT at SemEval-2023 Tasks 12 A and 12B: Sentiment Analysis for Tweets Written in Low-resource African Languages using Various Machine Learning and Deep …

R Keinan, Y HaCohen-Kerner - Proceedings of the 17th …, 2023 - aclanthology.org
In this paper, we describe our submissions to the SemEval-2023 contest. We tackled
subtask 12-“AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages …

Text Mining at SemEval-2024 Task 1: Evaluating Semantic Textual Relatedness in Low-resource Languages using Various Embedding Methods and Machine …

R Keinan - Proceedings of the 18th International Workshop on …, 2024 - aclanthology.org
In this paper, I describe my submission to the SemEval-2024 contest. I tackled subtask 1-
“Semantic Textual Relatedness for African and Asian Languages”. To find the semantic …

Detection of Anorexic Girls-In Blog Posts Written in Hebrew Using a Combined Heuristic AI and NLP Method

Y Hacohen-Kerner, N Manor, M Goldmeier… - IEEE …, 2022 - ieeexplore.ieee.org
In this study, we aim to detect in social media texts written in Hebrew girls who are
suspected of being anorexic. We constructed a dataset containing 100 blog posts written by …

[PDF][PDF] Detecting Offensive Language in English Hindi and Marathi using Classical Supervised Machine Learning Methods and Word/Char N-grams.

Y HaCohen-Kerner, M Uzan - FIRE (Working Notes), 2021 - researchgate.net
In this paper, we describe our submissions for the HASOC 2021 contest. We tackled subtask
1A that addresses the problem of hate speech and offensive language identification in three …

JCT at SemEval-2022 Task 6-A: Sarcasm Detection in Tweets Written in English and Arabic using Preprocessing Methods and Word N-grams

Y HaCohen-Kerner, M Fchima… - Proceedings of the 16th …, 2022 - aclanthology.org
In this paper, we describe our submissions to SemEval-2022 contest. We tackled subtask 6-
A-“iSarcasmEval: Intended Sarcasm Detection In English and Arabic–Binary Classification” …

[PDF][PDF] Detecting Fake News in URDU using Classical Supervised Machine Learning Methods and Word/Char N-grams.

Y HaCohen-Kerner, N Manor, N Bashan… - FIRE (Working Notes …, 2021 - ceur-ws.org
In this paper, we describe our submissions for the UrduFake 2021 track. We tackled the task
entitled “Fake News Detection in the Urdu Language". We developed different models using …