Starcoder: may the source be with you!
The BigCode community, an open-scientific collaboration working on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder and …
development of Large Language Models for Code (Code LLMs), introduces StarCoder and …
Multi-step jailbreaking privacy attacks on chatgpt
With the rapid progress of large language models (LLMs), many downstream NLP tasks can
be well solved given appropriate prompts. Though model developers and researchers work …
be well solved given appropriate prompts. Though model developers and researchers work …
Evaluating the social impact of generative ai systems in systems and society
Generative AI systems across modalities, ranging from text (including code), image, audio,
and video, have broad social impacts, but there is no official standard for means of …
and video, have broad social impacts, but there is no official standard for means of …
Starcoder 2 and the stack v2: The next generation
The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
A survey of large language models attribution
Open-domain generative systems have gained significant attention in the field of
conversational AI (eg, generative search engines). This paper presents a comprehensive …
conversational AI (eg, generative search engines). This paper presents a comprehensive …
Embers of autoregression: Understanding large language models through the problem they are trained to solve
The widespread adoption of large language models (LLMs) makes it important to recognize
their strengths and limitations. We argue that in order to develop a holistic understanding of …
their strengths and limitations. We argue that in order to develop a holistic understanding of …
[HTML][HTML] An archival perspective on pretraining data
Alongside an explosion in research and development related to large language models,
there has been a concomitant rise in the creation of pretraining datasets—massive …
there has been a concomitant rise in the creation of pretraining datasets—massive …
Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms
Natural Language Processing (NLP) research is increasingly focusing on the use of Large
Language Models (LLMs), with some of the most popular ones being either fully or partially …
Language Models (LLMs), with some of the most popular ones being either fully or partially …
Investigating data contamination in modern benchmarks for large language models
Recent observations have underscored a disparity between the inflated benchmark scores
and the actual performance of LLMs, raising concerns about potential contamination of …
and the actual performance of LLMs, raising concerns about potential contamination of …
Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling
Audits are critical mechanisms for identifying the risks and limitations of deployed artificial
intelligence (AI) systems. However, the effective execution of AI audits remains incredibly …
intelligence (AI) systems. However, the effective execution of AI audits remains incredibly …