CODE-MVP: Learning to represent source code from multiple views with contrastive pre-training
Recent years have witnessed increasing interest in code representation learning, which
aims to represent the semantics of source code into distributed vectors. Currently, various …
aims to represent the semantics of source code into distributed vectors. Currently, various …
Models are codes: Towards measuring malicious code poisoning attacks on pre-trained model hubs
The proliferation of pre-trained models (PTMs) and datasets has led to the emergence of
centralized model hubs like Hugging Face, which facilitate collaborative development and …
centralized model hubs like Hugging Face, which facilitate collaborative development and …
Enhancing comprehension and navigation in Jupyter notebooks with static analysis
Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line
visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating …
visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating …
Peatmoss: A dataset and initial analysis of pre-trained models in open-source software
The development and training of deep learning models have become increasingly costly
and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for …
and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for …
Data leakage in notebooks: Static detection and better processes
Data science pipelines to train and evaluate models with machine learning may contain
bugs just like any other code. Leakage between training and test data can lead to …
bugs just like any other code. Leakage between training and test data can lead to …
Static analysis driven enhancements for comprehension in machine learning notebooks
APS Venkatesh, S Sabu, M Chekkapalli… - Empirical Software …, 2024 - Springer
Jupyter notebooks have emerged as the predominant tool for data scientists to develop and
share machine learning solutions, primarily using Python as the programming language …
share machine learning solutions, primarily using Python as the programming language …
Investigating and Detecting Silent Bugs in PyTorch Programs
Deep Learning (DL) has been widely applied in various fields. Unlike traditional software,
DL programs possess the “black box” characteristic that can make it challenging for …
DL programs possess the “black box” characteristic that can make it challenging for …
Exploring Hyperparameter Usage and Tuning in Machine Learning Research
The success of machine learning (ML) models depends on careful experimentation and
optimization of their hyperparameters. Tuning can affect the reliability and accuracy of a …
optimization of their hyperparameters. Tuning can affect the reliability and accuracy of a …
Hard to Read and Understand Pythonic Idioms? DeIdiom and Explain Them in Non-Idiomatic Equivalent Code
The Python community strives to design pythonic idioms so that Python users can achieve
their intent in a more concise and efficient way. According to our analysis of 154 questions …
their intent in a more concise and efficient way. According to our analysis of 154 questions …
Complex Python features in the wild
Y Yang, A Milanova, M Hirzel - … of the 19th International Conference on …, 2022 - dl.acm.org
While Python is increasingly popular, program analysis tooling for Python is lagging. This is
due, in part, to complex features of the Python language---features with difficult to …
due, in part, to complex features of the Python language---features with difficult to …