Impact of tokenization on language models: An analysis for turkish
Tokenization is an important text preprocessing step to prepare input tokens for deep
language models. WordPiece and BPE are de facto methods employed by important …
language models. WordPiece and BPE are de facto methods employed by important …
Named entity recognition in Turkish: A comparative study with detailed error analysis
Named entity recognition aims to detect pre-determined entity types in unstructured text.
There is a limited number of studies on this task for low-resource languages such as Turkish …
There is a limited number of studies on this task for low-resource languages such as Turkish …
An approach to automatic classification of hate speech in sports domain on social media
S Vujičić Stanković, M Mladenović - Journal of Big Data, 2023 - Springer
Hate Speech encompasses different forms of trolling, bullying, harassment, and threats
directed against specific individuals or groups. This phenomena is mainly expressed on …
directed against specific individuals or groups. This phenomena is mainly expressed on …
What did you learn to hate? a topic-oriented analysis of generalization in hate speech detection
T Bourgeade, P Chiril, F Benamara… - Proceedings of the 17th …, 2023 - aclanthology.org
Hate speech has unfortunately become a significant phenomenon on social media
platforms, and it can cover various topics (misogyny, sexism, racism, xenophobia, etc.) and …
platforms, and it can cover various topics (misogyny, sexism, racism, xenophobia, etc.) and …
Metahate: A dataset for unifying efforts on hate speech detection
Hate speech represents a pervasive and detrimental form of online discourse, often
manifested through an array of slurs, from hateful tweets to defamatory posts. As such …
manifested through an array of slurs, from hateful tweets to defamatory posts. As such …
Arc-nlp at multimodal hate speech event detection 2023: Multimodal methods boosted by ensemble learning, syntactical and entity features
Text-embedded images can serve as a means of spreading hate speech, propaganda, and
extremist beliefs. Throughout the Russia-Ukraine war, both opposing factions heavily relied …
extremist beliefs. Throughout the Russia-Ukraine war, both opposing factions heavily relied …
Revisiting hate speech benchmarks: From data curation to system deployment
Social media is awash with hateful content, much of which is often veiled with linguistic and
topical diversity. The benchmark datasets used for hate speech detection do not account for …
topical diversity. The benchmark datasets used for hate speech detection do not account for …
Resources for Turkish natural language processing: A critical survey
This paper presents a comprehensive survey of corpora and lexical resources available for
Turkish. We review a broad range of resources, focusing on the ones that are publicly …
Turkish. We review a broad range of resources, focusing on the ones that are publicly …
Nehate: Large-scale annotated data shedding light on hate speech in nepali local election discourse
The use of social media during election campaigns has become increasingly popular.
However, the unbridled nature of online discourse can lead to the propagation of hate …
However, the unbridled nature of online discourse can lead to the propagation of hate …
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection
This paper compares different pre-trained and fine-tuned large language models (LLMs) for
hate speech detection. Our research underscores challenges in LLMs' cross-domain validity …
hate speech detection. Our research underscores challenges in LLMs' cross-domain validity …