Gotcha! this model uses my code! evaluating membership leakage risks in code models

A Lozhkov, R Li, LB Allal, F Cassano… - arXiv preprint arXiv …, 2024 - arxiv.org

The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …

被引用次数：116 相关文章所有 2 个版本

[PDF] arxiv.org

Robustness, security, privacy, explainability, efficiency, and usability of large language models for code

Z Yang, Z Sun, TZ Yue, P Devanbu, D Lo - arXiv preprint arXiv:2403.07506, 2024 - arxiv.org

Large language models for code (LLM4Code), which demonstrate strong performance (eg,
high accuracy) in processing source code, have significantly transformed software …

被引用次数：26 相关文章所有 2 个版本

[PDF] usenix.org

Exploring {ChatGPT's} Capabilities on Vulnerability Management

P Liu, J Liu, L Fu, K Lu, Y Xia, X Zhang… - 33rd USENIX Security …, 2024 - usenix.org

Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works
show that ChatGPT has the capabilities of processing foundational code analysis tasks …

被引用次数：1 相关文章所有 6 个版本

[PDF] arxiv.org

Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models

S Zhang, H Li - arXiv preprint arXiv:2312.07200, 2023 - arxiv.org

Code pre-trained language models (CPLMs) have received great attention since they can
benefit various tasks that facilitate software development and maintenance. However …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code

V Majdinasab, A Nikanjam, F Khomh - arXiv preprint arXiv:2402.09299, 2024 - arxiv.org

Code auditing ensures that the developed code adheres to standards, regulations, and
copyright protection by verifying that it does not contain code from protected sources. The …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

Y Wan, G Wan, S Zhang, H Zhang, P Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent years have witnessed significant progress in developing deep learning-based
models for automated code completion. Although using source code in GitHub has been a …

被引用次数：1 相关文章所有 2 个版本

[PDF] acm.org

An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets

J Katzy, R Popescu, A Van Deursen… - … of the 2024 IEEE/ACM First …, 2024 - dl.acm.org

Does the training of large language models potentially infringe upon code licenses?
Furthermore, are there any datasets available that can be safely used for training these …

被引用次数：2 相关文章所有 6 个版本