Creating a Coding Assistant with StarCoder. Hugging Face Blog (2023)

G Cui, L Yuan, N Ding, G Yao, W Zhu, Y Ni, G Xie, Z Liu… - 2023 - openreview.net

Reinforcement learning from human feedback (RLHF) has become a pivot technique in
aligning large language models (LLMs) with human preferences. In RLHF practice …

被引用次数：227 相关文章所有 2 个版本

[PDF] arxiv.org

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

被引用次数：167 相关文章所有 4 个版本

[PDF] usenix.org

Large language models for code analysis: Do {LLMs} really do their job?

C Fang, N Miao, S Srivastav, J Liu, R Zhang… - 33rd USENIX Security …, 2024 - usenix.org

Large language models (LLMs) have demonstrated significant potential in the realm of
natural language understanding and programming code processing tasks. Their capacity to …

被引用次数：26 相关文章所有 4 个版本

[PDF] aclanthology.org

DocMath-eval: Evaluating math reasoning capabilities of LLMs in understanding long and specialized documents

Y Zhao, Y Long, H Liu, R Kamoi, L Nan… - Proceedings of the …, 2024 - aclanthology.org

Recent LLMs have demonstrated remarkable performance in solving exam-like math word
problems. However, the degree to which these numerical reasoning skills are effective in …

被引用次数：8 相关文章所有 3 个版本

[PDF] aclanthology.org

KnowledgeFMath: A knowledge-intensive math reasoning dataset in finance domains

Y Zhao, H Liu, Y Long, R Zhang, C Zhao… - Proceedings of the …, 2024 - aclanthology.org

We introduce KnowledgeFMath, a novel benchmark designed to evaluate LLMs' capabilities
in solving knowledge-intensive math reasoning problems. Compared to prior works, this …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

StudentEval: a benchmark of student-written prompts for large language models of code

HML Babe, S Nguyen, Y Zi, A Guha… - arXiv preprint arXiv …, 2023 - arxiv.org

Code LLMs are being rapidly deployed and there is evidence that they can make
professional programmers more productive. Current benchmarks for code generation …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

Towards understanding the capability of large language models on code clone detection: a survey

S Dou, J Shan, H Jia, W Deng, Z Xi, W He, Y Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Code cloning, the duplication of code fragments, is common in software development. While
some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Docmath-eval: Evaluating numerical reasoning capabilities of llms in understanding long documents with tabular data

Y Zhao, Y Long, H Liu, L Nan, L Chen, R Kamoi… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent LLMs have demonstrated remarkable performance in solving exam-like math word
problems. However, the degree to which these numerical reasoning skills are effective in …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Coffee: Boost your code llms by fixing bugs with feedback

S Moon, H Chae, Y Song, T Kwon, D Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

Code editing is an essential step towards reliable program synthesis to automatically correct
critical errors generated from code LLMs. Recent studies have demonstrated that closed …

被引用次数：10 相关文章所有 2 个版本

[PDF] aaai.org

Text2analysis: A benchmark of table question answering with advanced data analysis and unclear queries

X He, M Zhou, X Xu, X Ma, R Ding, L Du… - Proceedings of the …, 2024 - ojs.aaai.org

Tabular data analysis is crucial in various fields, and large language models show promise
in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL …

被引用次数：15 相关文章所有 5 个版本