Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

A Nyffenegger, M Stürmer, J Niklaus - arXiv preprint arXiv:2308.11103, 2023 - arxiv.org
Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy
protection in the European Union and Switzerland. With the advent of LLMs, concerns about …

Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

J Vamvas, N Aepli, R Sennrich - arXiv preprint arXiv:2401.14400, 2024 - arxiv.org
Creating neural text encoders for written Swiss German is challenging due to a dearth of
training data combined with dialectal variation. In this paper, we build on several existing …

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

M Raj, LK Soon, HF Ong… - Proceedings of the 9th …, 2024 - aclanthology.org
Malaysian English is a low resource creole languages, where it carries the elements of
Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity …

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

MR Chanthran, LK Soon, HF Ong… - arXiv preprint arXiv …, 2024 - arxiv.org
Malaysian English is a low resource creole language, where it carries the elements of
Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity …

Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents

J Grosjean, J Vamvas - arXiv preprint arXiv:2405.07513, 2024 - arxiv.org
Encoder models trained for the embedding of sentences or short documents have proven
useful for tasks such as semantic search and topic modeling. In this paper, we present a …

Swissdox@ LiRI–a large database of media articles made accessible to researchers

J Graën, I Mustac, N Rajovic, J Schaber… - CLARIN annual …, 2023 - zora.uzh.ch
This article presents our efforts to make a large collection of Swiss newspaper articles
available for research purposes. We describe the resource, detail the concept of financing …

[PDF][PDF] Introducing embed2discover: A tool for semi-automated, dictionary-based content-analysis

L Brandenberger, O Bakhteev, JM Fernandez… - files.osf.io
We introduce embed2discover, a new tool for dictionary-based content analysis. The tool
combines state-of-the-art machine learning and language model methodologies with …

[PDF][PDF] Swissdox@ LiRI–a large database of media articles made accessible to researchers

J Schaber, J Graën, I Mustač, N Rajović, G Schneider… - clarin.eu
The 'Schweizer Mediendatenbank AG'(SMD) is a nonprofit joint venture of three big Swiss
media groups with the purpose of collecting print and online publications, as well as TV …