The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large

S Biswas, M Wardat, H Rajan - … of the 44th International Conference on …, 2022 - dl.acm.org
Increasingly larger number of software systems today are including data science
components for descriptive, predictive, and prescriptive analytics. The collection of data …

A comprehensive study on deep learning bug characteristics

MJ Islam, G Nguyen, R Pan, H Rajan - … of the 2019 27th ACM joint …, 2019 - dl.acm.org
Deep learning has gained substantial popularity in recent years. Developers mainly rely on
libraries and tools to add deep learning capabilities to their software. What kinds of bugs are …

The prevalence of code smells in machine learning projects

B Van Oort, L Cruz, M Aniche… - 2021 IEEE/ACM 1st …, 2021 - ieeexplore.ieee.org
Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer
science landscape. Yet, there still exists a lack of software engineering experience and best …

A preliminary investigation of MLOps practices in GitHub

F Calefato, F Lanubile, L Quaranta - Proceedings of the 16th ACM/IEEE …, 2022 - dl.acm.org
Background. The rapid and growing popularity of machine learning (ML) applications has
led to an increasing interest in MLOps, that is, the practice of continuous integration and …

An empirical study for common language features used in python projects

Y Peng, Y Zhang, M Hu - 2021 IEEE International Conference …, 2021 - ieeexplore.ieee.org
As a dynamic programming language, Python is widely used in many fields. For developers,
various language features affect programming experience. For researchers, they affect the …

A large-scale comparative analysis of coding standard conformance in open-source data science projects

AJ Simmons, S Barnett, J Rivera-Villicana… - Proceedings of the 14th …, 2020 - dl.acm.org
Background: Meeting the growing industry demand for Data Science requires cross-
disciplinary teams that can translate machine learning research into production-ready code …

23 shades of self-admitted technical debt: An empirical study on machine learning software

D OBrien, S Biswas, S Imtiaz, R Abdalkareem… - Proceedings of the 30th …, 2022 - dl.acm.org
In software development, the term “technical debt”(TD) is used to characterize short-term
solutions and workarounds implemented in source code which may incur a long-term cost …

[HTML][HTML] The yin yang of AI: Exploring how commercial and non-commercial orientations shape machine learning innovation

E Brea - Research Policy, 2024 - Elsevier
The scale of the potential implications of machine learning (ML) has prompted discussions
on the issues of corporate control and technological openness. However, how commercial …

An exploratory study on the predominant programming paradigms in Python code

R Dyer, J Chauhan - Proceedings of the 30th ACM Joint European …, 2022 - dl.acm.org
Python is a multi-paradigm programming language that fully supports object-oriented (OO)
programming. The language allows writing code in a non-procedural imperative manner …

Actor concurrency bugs: a comprehensive study on symptoms, root causes, API usages, and differences

M Bagherzadeh, N Fireman, A Shawesh… - Proceedings of the …, 2020 - dl.acm.org
Actor concurrency is becoming increasingly important in the development of real-world
software systems. Although actor concurrency may be less susceptible to some …