Towards Machine Learning Guided by Best Practices
Towards Machine Learning Guided by Best Practices
Anamaria Mojica-Hanke
University of Passau
Passau, Germany
Universidad de los Andes
Bogota, Colombia
[email protected]
arXiv:2305.00233v2 [cs.SE] 6 May 2023
Abstract—Nowadays, machine learning (ML) is being used in in which it is reported that AI has helped in increasing the
software systems with multiple application fields, from medicine revenue of companies in sales and product development [7].
to software engineering (SE). On the one hand, the popularity of On the not-that-bright side, recent studies have shown
ML in the industry can be seen in the statistics showing its
growth and adoption. On the other hand, its popularity can that ML-enabled systems (i.e., systems that have at least
also be seen in research, particularly in SE, where multiple one ML component) have challenges [9], pitfalls [10]–[12],
studies related to the use of Machine Learning in Software problems [13], or mismatches [14] as any software develop-
Engineering have been published in conferences and journals. ment process and system. Some studies also have shown that
At the same time, researchers and practitioners have shown that ML challenges and problems are also publicly discussed in
machine learning has some particular challenges and pitfalls. In
particular, research has shown that ML-enabled systems have a communities such as Stack Overflow [9], [15], [16]. More-
different development process than traditional software, which over, some studies indicate that ML systems have particular
also describes some of the challenges of ML applications. In problems and challenges [13], [14], and could be related
order to mitigate some of the identified challenges and pitfalls, to inadequate documentation and communication across the
white and gray literature has proposed a set of recommendations different actors involved in the ML development [14]; or
based on their own experiences and focused on their domain
(e.g., biomechanics), but for the best of our knowledge, there is technical debt [13]. In addition, the development of ML
no guideline focused on the SE community. This thesis aims to systems involves different phases from the traditional software
reduce the gap of not having clear guidelines in the SE community ones [17], and in each of those phases, different challenges
by using possible sources of practices such as question-and- could be presented [9], [14], [16].
answer communities and also previous research studies. As a Concerning ML challenges from an industry perspective,
result, we will present a set of practices with an SE perspective,
for researchers and practitioners, including a tool for searching some surveys show common fears and challenges faced by
them. companies. Including problems when collecting data [18],
Index Terms—Machine learning, good practices, software en- data quality [18], versioning, and reproducibility of the mod-
gineering els [19]. There are also some risks associated with artificial
I. M OTIVATION - P ROBLEM DEFINITION intelligence, such as equity and fairness or personal/individual
Machine learning (ML) has multiple fields of applications, privacy, that companies are working to mitigate [7].
such as finance, medicine, education, and software engineer- In order to avoid, mitigate or deal with the ML challenges,
ing. Indeed, in each of these fields, ML has different kinds of pitfalls, and problems, some studies have proposed a series
applications; for example, fraud detection [1] and trading [2]; of recommended guidelines and best practices based on their
cancer detection [3] and outbreak prediction [4]; program own experience and focused on their respective discipline,
translation [5]; and code transformation [6]. Showing the wide e.g., [11], [12], [20]. Additionally, there is a plethora of pub-
adoption of ML for different tasks and disciplines and its lications (e.g., books, blogs) on the field of ML; for example,
capability to affect multiple domains. grey literature, such as the Google article by Zinkevich [21], is
The aforementioned impact could also be seen in the publicly available and could be considered as a first step with
industry, as it has not only grown in its usage demand but general practices derived from anecdotal evidence. However,
also its popularity and benefits. For instance, a McKinsey to the best of our knowledge, there are no handbooks listing
report by Chui et al. states that 50% of the respondents best practices for using ML focused on SE practitioners or
to their study answered that their companies had adopted researchers, i.e., software engineers and researchers. As ML
artificial intelligence in at least one business function [7]; is becoming more and more involved in SE development
this is also supported by the most recent NewVantage Partner projects, bad practices should be avoided to prevent inadequate
2022 Data and AI Executive survey (NPAIS), that shows model planning, implementation, tuning, testing, deployment,
that 91% of organizations are investing in AI activities [8]. and monitoring of ML implementations, e.g., [14], [17]. The
Complementary, on the same survey, NPAIS, it is reported that interest in ML has also been displayed in the SE community
92.1% of organizations are realizing measurable benefits [8], as more workshops, and conferences related to ML are being
which could also be seen in the survey realized by McKinsey colocated within SE conferences, e.g., [22]–[27]. We aim to
reduce the gap of not having the aforementioned handbook by review on possible challenges, they surveyed employees in
(i) studying what the best practices discussed by practitioners four companies. As a result, they identify that some challenges
are; (ii) analyzing what the practices used by SE researchers are similar to the traditional SE ones, but some of them are
when executing studies that involve ML models are, and (iii) only ML-specific challenges, showing that not all the ML
building a taxonomy and a handbook of ML practices that challenges can be addressed in the same way as traditional
could be used by the SE community that uses ML in order to SE challenges and problems.
guide their ML development and research. B. White literature
II. R ELATED W ORK A couple of white literature encompassing some best prac-
As previously mentioned, there is a plethora of ML litera- tices in different fields of knowledge that are not targeted for
ture; this literature can be peer-reviewed (i.e., white literature) relating ML and SE exists. For instance, [20] studies pitfalls
or not (e.g., technical reports and blog post) [28]. and challenges in biomechanics, followed by some practices
A. Gray literature to deal with them. In addition, [33] also focuses on pitfalls,
A quick search on Google using the keywords “machine but in relation to the use of ML use in omics data science,
learning” yields more than 647 million results; if we fur- and they give guidelines to avoid those pitfalls, without giving
ther specify the query to include “machine learning” AND details on how those guidelines are extracted.
“practices,” the number of hits is reduced to more than 95 Regarding the white literature that discusses ML and SE,
million hits, and it gets lower if we search for “best practices” there are a couple of studies [17], [34]–[37] but with a different
instead, which yields about 40 million results instead. Showing approach than the desired one (i.e., ML for SE). Some areas
a significant amount of information that could be related to of SE, such as defect modelling, use ML as a tool to achieve
finding good guidelines when developing an ML system. its purpose, which in this case would be to “understand
Some of the gray literature that can be found when searching actionable insights in order to make better decisions related
for ML practices or guidelines are published by recognized to the different phases of software practice” [34]. And asso-
institutions such as SAS [29], Google [21], [30], or Carnegie ciated with ML use, research has been conducted to find and
Mellon [31]. In particular, the aforementioned works present understand some problems and challenges, also giving active
a list of practices and guidelines based on the authors’ experi- recommendations on how to handle them [34]. However, those
ence. The last two institutions, [21], [31], present the practices guidelines do not only focus on ML but also on different
in a broad and general way. By broad and general, we mean aspects that are associated with the specific ML applications
that the practices are not specific for use cases but general without a clear separation of those two aspects. In addition,
recommendations, such as “thinking about if the usage of those guides are not always associated with other possible SE
ML/AI is beneficial and necessary or not”. applications in which they can be applied. This association is
Regarding the more detailed practices, Google [30] presents also missing in the study by Arp et al. [37], which identifies
a set of practices related to designing with AI. The practices ten pitfalls in learning-based security systems and evaluates
are presented in different ways. For example, they are asso- their existence in the security literature using ML. They also
ciated with a series of case studies, organized by chapters give recommendations on how to avoid the pitfalls.
that are milestones in product development flow, or they are The other three aforementioned studies that relate SE and
retrieved by questions that guide the practice search. In addi- ML, [17], [35], [36], are the ones that are more closely related
tion, SAS [29] presents a series of practices that are associated to our approach. First, Breck et al. [35] list 28 practices for
with different stages of the ML development process, i.e., data testing and monitoring different stages of the ML development
preparation, training, and deployment. For each of these stages, process. Amershi et al. [17] conducted a study that reports a
a brief description of what each stage encompasses is given, broad range of practices and challenges observed in software
followed by a theory of possible approaches, e.g., ways to teams at Microsoft as they develop AI-based applications.
deploy a model. However, this report is missing some ML Nevertheless, the set of challenges and practices are broad and
pipeline stages like model requirement (i.e., the stage in which often not actionable. They also focus on a single enterprise
“designers decide the functionalities that should be included in (Microsoft). Finally, Serban et al. [36] listed a series of best
an ML system, their usefulness for new or existing products”), practices for ML applications. The presented practices are
model evaluation, data labeling, and model monitoring. mined from academic and gray literature with an engineering
In addition, some of the gray literature covered are pre- perspective, meaning that the practices are from an engineering
prints or papers that are not peer-reviewed, e.g., [12], [32]. point of view and not ML, software engineering for ML
Lones et al. [12] present a series of challenges that they (SE4ML). Serban et al. [36] also present a taxonomy of
have encountered during their time in academia (i.e., teaching the SE4ML practices, and the taxonomy has six categories:
and researching). For some of the pitfalls, they present a data, training, coding, deployment, team, and governance. This
possible way to handle or avoid the error. They mainly focus taxonomy was validated via a survey in which they asked
on research and on properties (i.e., robustness, reliability) the respondents (i.e., researchers and practitioners) about the
more than stages of the development of the ML system. adoption of the identified practices on it. The authors also
Clemmedsson et al. [32] focus only on ML pitfalls and surveyed if adopting a set of practices would lead to a desired
challenges in an industrial case study where, after a literature effect (e.g., agility, software quality, and traceability). As a
result, of their study, they present the list of practices in their opinion on the use of ML practices and the consequences of
article and provide an online tool in which the practices are omitting them.
presented in more detail in the aforementioned taxonomy. RQ3 What are the practices identified and adopted
Since our goal is to help reduce the gap of not having a clear by practitioners and researchers?
handbook of ML practices applied to SE, we want to build on
Answering this RQ will give both practitioners and re-
the strengths identified in the related work to accomplish that
searchers a better perspective on the used and identified ML
goal. This means that (i) we focus on the approach of ML
practices. This will help the SE community, in general, to be
for SE, trying to understand the perspective of researchers but
aware of possible practices that are being used and/or omitted.
also practitioners; (ii) give context to the practices and not only
For this RQ, we will compile a handbook of practices from
the practice itself; in this way, the practices are actionable and
the perspective of SE researchers and practitioners. As this
meaningful; (iii) related to the last point we want to provide
RQ complements the previous two RQs, we will consider
not only case studies in which the practices are used but also
the results obtained in RQ1 and RQ2 while comparing and
help the interested person to identify what task they are trying
complementing the identified practices from both perspectives.
to achieve/execute in an ML pipeline.
In addition, we will enrich the practices with complementary
III. R ESEARCH Q UESTIONS
information, such as use cases and previous research, to pro-
The following research questions (RQs) aim to reduce the
vide context and examples of their use. Also, we will provide
gap towards having a clear source of ML practices oriented
the nature of the perspective (i.e., researchers, practitioners, or
to SE. For that, the RQs are going to be crowdsourced data-
both).
driven, which means that the sources of information that will
support them are not from a single source of information, RQ4 To what extend do the identified practices affect
but from multiple sources which are not centralized and with previous research?
different origins. Understanding how the use (or lack of use) of the practices
affects the result of research studies will give the community
RQ1. What is the perspective of ML practitioners
an idea of the impact and importance that this could cause.
on best practices, and in which ML stages are they
Understanding the impact could generate more awareness of
located?
the use and report of the ML practices followed during the
Answering this research question will help the practitioners’ study. For this, we will use a sample of SE studies that use
community to understand which stages of the process of devel- ML, that can be replicable, which allows us to obtain the
oping ML systems have best practices associated with them, same/similar results and then apply or omit ML practices and
and also could help to avoid pitfalls that are being executed evaluate how that affects the results that were reported by the
by omitting, with or without knowing, technical requirements studies. Kindly note that the study will be executed in a way in
or knowledge of the system. We will answer this question which, when reporting the results, they will not directly point
by studying Stack Exchange posts in which practitioners ask to specific studies. This study will be a reflective exercise
questions about different topics, including ML, as mentioned rather than a finger-pointing one, as previously done by the
in previous literature. And in order to minimize false positives, study conducted by Arp et al. [37] when identifying dos and
we will conduct filtering on the relevant posts based on topic dont’s in ML in computer security.
(ML) and quality. After filtering the data, a process of analysis IV. C URRENT STATUS
should be carried out in order to extract the possible practices. This section describes the results achieved so far for each
Subsequently, the practices should be validated by ML experts research question in the context of the related work.
in order to filter out the practices that may be outdated or are Regarding our first research question, in a paper currently
not considered good practices. under review [38], we used as data source 14 StackExchange
RQ2. What is the perspective and adoption of ML websites, including StackOverflow. We decided to use this
practices by researchers and their studies? family of Q&A because of its popularity in the SE com-
The identified practices in this research question will give munity, which can be seen in multiple studies that have
an indication of what practices are being used and reported used StackOverflow as a data source to analyze different
by the SE research community. This will help the SE research topics in SE, e.g., [39]–[41]. In addition, to its popularity
community to (i) identify possible points to strengthen the in the SE community, StackOverflow has also been used to
research and focus when describing their studies and proto- study ML-related topics such as expertise and challenges [9],
cols, (ii) identify possible good practices with SE examples, problems and challenges for ML libraries [15], and popular
which could facilitate the use of good practices and avoid deep learning topics [42].
making mistakes. We will answer this question by sampling As a result of selecting Q&A Stack Exchange websites,
ML-related papers from SE conferences and identifying ML filtering posts from them to extract the possible ML practices,
practices, then they will be categorized in the different ML and analyzing them, we obtained 157 ML best practices and
pipeline stages and SE applications (i.e., defect modeling). their taxonomy. The practices were obtained by executing an
Complementary to this, we will conduct a survey and in- open-code procedure in which tags for the different practices
terviews with ML research experts in order to identify their were identified and assigned together with the identification
of the ML pipeline stage(s) associated with each possible them good ones. This could mean that those practices, with
practice. For stage identification, we used a predefined ML divided opinions, were not well-known, or without a use case
pipeline built by Amershi et al. [17]. As a result of the open- scenario, were not clear. In addition, some practices that were
coding process, a list of 187 practices was identified, but only considered contradictory, i.e., practices that indicate opposite
157 were considered best practices by ML experts after a actions, were agreed as good practices by the experts, which
validation process. could also be an indicator of the need for more context to
Another outcome of the open-coding process is a four-level present an actionable practice.
taxonomy. The first level of the taxonomy consists of the Concerning RQ2, we are working on a study in which
10 ML-pipeline stages proposed by Amershi et al. [17]. The research-track articles from top SE conferences are being
second level consists of categories that encompass multiple analyzed in order to understand the reported and used ML
tasks for each ML pipeline stage, e.g., the learning category practices. In this study, we are also taking as a reference
in the model training stage. The third level of the taxonomy the ML pipeline proposed by Amershi et al. [17], for cat-
is composed of an action/task that can be performed in each egorizing the identified practices. In preliminary results, we
ML stage. The fourth level is the practice itself. have found that the least mentioned stages, i.e., stages in
Upon further analysis of the practices, we identified which which few practices were identified, are related to model
ML-pipeline stages had the highest number of practices and requirements, model deployment, and model monitoring. The
which ones had the lowest number of practices. On the one last two were expected as it is not common to describe/execute
side, the ML pipeline stages with the highest number of those two stages in a research study. While the first mentioned
identified practices were model training and data cleaning, stage, model requirements, could be considered the basis of
which could indicate an interest of practitioners in those two a research study that uses ML, as it could define how the
stages. The interest could be related to (i) model training being models should be built, and not defining it properly could
the core of the ML pipeline, as it enables the use of a model; cause disastrous consequences.
(ii) data cleaning is a stage where data scientists spend most Regarding RQ3, as part of the process of presenting a hand-
of their time [43]. book of practices with both perspectives, practitioners, and
On the other side, model deployment, model monitoring, researchers, we are currently designing an approach/tool, that
and data labeling are the ones with the lowest number of will not only be able to be referenced but that will be useful in
identified practices. Regarding the low number of practices a practical way. With that, we mean that the practices will be
in model deployment and monitoring, this could be due to associated and enriched with context, examples, and possible
these stages are more related to the “operations” staff [14] identified limitations. In addition, this tool should present the
(i.e., staff in charge of deploying, operating, and monitoring aforementioned information in a friendly way, which will
ML-enabled systems). Regarding the low number of practices allow the users of the tool to find relevant information without
identified for the data labeling stage, it could be related to going through an entire book, blog, or research article. For
the intrinsic nature of this stage. By this, we mean that this that, we are identifying ways that practices could be presented
stage is inherently not mandatory in the ML pipeline, as it in a more interactive way, like the appendix presented by
is only needed when ground truth is required, e.g., supervised Serban et al. [36] for the SE practices for ML, the practices
and semi-supervised learning. In addition, sometimes, the data presented by Google in their “People + AI guided book” [30],
used to train models has already been labeled, which could the “Deep Learning Tuning Book” [45] focused on the process
lead to efforts being focused on other ML phases. of model hyperparameter tuning addressed to engineers and
Another aspect that we noticed when analyzing the identi- researchers. We also take as a reference other white and gray
fied practices in the Q&A websites is that they did not cover literature aforementioned in the related work that, for each
some specific topics. Ethics is an example of a topic that practice, present additional information, such as use cases,
was not discussed/covered by the identified practices. This e.g., [29], [37].
could indicate that there is a need to explore other sources V. T IMELINE
of information to find ML-best practices, such as technical I am currently in my third year of my four-year Ph.D.
blogs like the one presented by IBM [44], which presents an program. In my third year, I plan to continue working on
ethical framework for ML. my research, focusing mainly on RQ2 and RQ3, while also
When analyzing the validation of the ML experts, we finishing answering RQ1. In my last year, we will focus on
noticed some aspects to highlight. Firstly, the majority of the RQ4 in order to complete it in my fourth year.
practices considered good were validated by all the experts, VI. C ONCLUSION
which means that there was unanimous agreement. However, As a conclusion of this thesis, a synthesis of all four
30 practices were rejected by the experts, as only half or research questions will be provided, including a proposed tool
less of the experts considered them valid best practices. After to retrieve the practices. This will help to reduce the gap of
inspecting the practices that the ML experts rejected, we not having a clear handbook of ML practices applied to SE,
noticed that, in most cases, the opinion was divided. This since the set of validated practices will be oriented to the SE
means that most of the time, half of the experts considered that community with practitioners’ and researchers’ perspectives,
the practices were not good ones, but the other half considered which will be complemented with context and SE use cases.
R EFERENCES [21] M. Zinkevich, “Rules of machine learning: Best practices for ml
[1] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit engineering,” Sep 2021. [Online]. Available: https://developers.google.
card fraud detection using machine learning techniques: A comparative com/machine-learning/guides/rules-of-ml
analysis,” in 2017 international conference on computing networking [22] K. Fürlinger, “Workshop on machine learning techniques for software
and informatics (ICCNI). IEEE, 2017, pp. 1–9. development and optimization (mlopt 2023),” 2023. [Online]. Available:
[2] H. Sebastião and P. Godinho, “Forecasting and trading cryptocurrencies https://www.mlopt-workshop.org/
with machine learning under changing market conditions,” Financial [23] C. 2022, “Cain 2022 - international conference on ai engineering -
Innovation, vol. 7, no. 1, pp. 1–30, 2021. software engineering for ai,” 2022. [Online]. Available: https://conf.
[3] T. Saba, “Recent advancement in cancer detection using machine learn- researchr.org/home/cain-2022
ing: Systematic survey of decades, comparisons and challenges,” Journal [24] I. 2022, “Workshop on software engineering for responsible artificial
of Infection and Public Health, vol. 13, no. 9, pp. 1274–1289, 2020. intelligence (se4rai),” 2022. [Online]. Available: https://conf.researchr.
[4] S. F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A. R. Varkonyi- org/home/icse-2022/se4rai-2022
Koczy, U. Reuter, T. Rabczuk, and P. M. Atkinson, “Covid-19 outbreak [25] 2022, “Ai-assisted code companion workshop,” 2022. [Online].
prediction with machine learning,” Algorithms, vol. 13, no. 10, p. 249, Available: https://sites.google.com/view/a2c2cwspr/home?authuser=0
2020. [26] ——, “Maltesque 2022,” 2022. [Online]. Available: https://
[5] B. Roziere, M.-A. Lachaux, L. Chanussot, and G. Lample, “Unsu- maltesque2022.github.io/
pervised translation of programming languages,” Advances in Neural [27] ESEC/FSE, May 2022. [Online]. Available: https://easeai.github.io/
Information Processing Systems, vol. 33, pp. 20 601–20 611, 2020. [28] J. Soldani, “Grey literature: A safe bridge between academy and
[6] M. Tufano, J. Pantiuchina, C. Watson, G. Bavota, and D. Poshyvanyk, industry?” ACM SIGSOFT Software Engineering Notes, vol. 44, no. 3,
“On learning meaningful code changes via neural machine translation,” pp. 11–12, 2019.
in 2019 IEEE/ACM 41st International Conference on Software Engi- [29] B. Wujek, P. Hall, and F. Günes, “Best practices for machine learning
neering (ICSE). IEEE, 2019, pp. 25–36. applications,” SAS Institute Inc, 2016.
[7] M. Chui, B. Hall, H. Mayhew, and A. Singla, “The state of ai in 2020,” [30] G. PAIR, “People + ai guidebook,” May 2021. [Online]. Available:
2022. [Online]. Available: https://www.mckinsey.com/capabilities/ https://pair.withgoogle.com/guidebook/
quantumblack/our-insights/the-state-of-ai-in-2022-and-a-half-decade- [31] A. Horneman, A. Mellinger, and I. Ozkaya, “Ai engineering: 11 founda-
in-review tional practices,” Carnegie Mellon University Pittsburgh United States,
[8] R. Bean, “Newvantage partners releases 2022 data and ai executive Tech. Rep., 2020.
survey,” Jan 2022. [Online]. Available: https://www.businesswire.com/ [32] E. Clemmedsson, “Identifying pitfalls in machine learning implementa-
news/home/20220103005036/en/NewVantage-Partners-Releases-2022- tion projects - a case study of four technology-intensive organizations,”
Data-And-AI-Executive-Survey MA Thesis, Royal Institue of Technology (KTH), SE-100 44 STOCK-
[9] M. Alshangiti, H. Sapkota, P. K. Murukannaiah, X. Liu, and Q. Yu, HOLM, May 2018.
“Why is developing machine learning applications challenging? a study [33] A. E. Teschendorff, “Avoiding common pitfalls in machine learning omic
on stack overflow posts,” in 2019 ACM/IEEE International Symposium data science,” Nature Materials, vol. 18, no. 5, pp. 422–427, 2019.
on Empirical Software Engineering and Measurement (ESEM), 2019, [34] C. Tantithamthavorn and A. E. Hassan, “An experience report on defect
pp. 1–11. modelling in practice: Pitfalls and challenges,” in Proceedings of the 40th
[10] D. Bone, M. S. Goodwin, M. P. Black, C.-C. Lee, K. Audhkhasi, international conference on software engineering: Software engineering
and S. Narayanan, “Applying machine learning to facilitate autism in practice, 2018, pp. 286–295.
diagnostics: pitfalls and promises,” Journal of autism and developmental [35] E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley, “The ml test score:
disorders, vol. 45, no. 5, pp. 1121–1136, 2015. A rubric for ml production readiness and technical debt reduction,” in
[11] S. Biderman and W. J. Scheirer, “Pitfalls in machine learning research: 2017 IEEE International Conference on Big Data (Big Data). IEEE,
Reexamining the development cycle,” 2020. 2017, pp. 1123–1132.
[12] M. A. Lones, “How to avoid machine learning pitfalls: a guide for [36] A. Serban, K. van der Blom, H. Hoos, and J. Visser, “Adoption and
academic researchers,” CoRR, vol. abs/2108.02497, 2021. [Online]. effects of software engineering best practices in machine learning,”
Available: https://arxiv.org/abs/2108.02497 in Proceedings of the 14th ACM/IEEE International Symposium on
[13] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, Empirical Software Engineering and Measurement (ESEM), 2020, pp.
V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden tech- 1–12.
nical debt in machine learning systems,” Advances in neural information [37] D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wress-
processing systems, vol. 28, 2015. negger, L. Cavallaro, and K. Rieck, “Dos and don’ts of machine learning
[14] G. A. Lewis, S. Bellomo, and I. Ozkaya, “Characterizing and detecting in computer security,” in Proc. of USENIX Security Symposium, 2022.
mismatch in machine-learning-enabled systems,” in 2021 IEEE/ACM 1st [38] A. Mojica-Hanke, A. Bayona, M. Linares-Vásquez, S. Herbold, and
Workshop on AI Engineering - Software Engineering for AI (WAIN), F. A. González, “What are the machine learning best practices reported
2021, pp. 133–140. by practitioners on stack exchange?” [Online]. Available: https://arxiv.
[15] M. J. Islam, H. Nguyen, R. Pan, and H. Rajan, “What do developers org/abs/2301.10516
ask about ml libraries? a large-scale study using stack overflow,” ArXiv, [39] H. Zhang, S. Wang, H. Li, T.-H. Chen, and A. E. Hassan, “A study
vol. abs/1906.11940, 2019. of c/c++ code weaknesses on stack overflow,” IEEE Transactions on
[16] A. Hamidi, G. Antoniol, F. Khomh, M. Di Penta, and M. Hamidi, Software Engineering, vol. 48, no. 7, pp. 2359–2375, 2021.
“Towards understanding developers’ machine-learning challenges: A [40] S. Mondal, G. Uddin, and C. Roy, “Automatic prediction of rejected
multi-language study on stack overflow,” in 2021 IEEE 21st Interna- edits in stack overflow,” Empirical Software Engineering, vol. 28, no. 1,
tional Working Conference on Source Code Analysis and Manipulation pp. 1–43, 2023.
(SCAM). IEEE, 2021, pp. 58–69. [41] P. Chatterjee, M. Kong, and L. Pollock, “Finding help with programming
[17] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Na- errors: An exploratory study of novice software engineers’ focus in stack
gappan, B. Nushi, and T. Zimmermann, “Software engineering for overflow posts,” Journal of Systems and Software, vol. 159, p. 110454,
machine learning: A case study,” in 2019 IEEE/ACM 41st International 2020.
Conference on Software Engineering: Software Engineering in Practice [42] J. Han, E. Shihab, Z. Wan, S. Deng, and X. Xia, “What do programmers
(ICSE-SEIP). IEEE, 2019, pp. 291–300. discuss about deep learning frameworks,” Empirical Software Engineer-
[18] D. Creg and T. Baker, “Smarter humans. smarter machines.smarter ing, vol. 25, no. 4, p. 2694–2747, Jul 2020.
humans. smarter machines.” 2019. [43] Anaconda.inc, “State of data science - on the path to impact. 2021,”
[19] B. Thormundsson, “Machine learning challenges 2021,” Apr 2022. 2022.
[Online]. Available: https://www.statista.com/statistics/1111249/ [44] I. C. Education, “Ai ethics,” 2021. [Online]. Available: https://www.
machine-learning-challenges/ ibm.com/cloud/learn/ai-ethics#toc-establishi-mewvd8CY
[20] E. Halilaj, A. Rajagopal, M. Fiterau, J. L. Hicks, T. J. Hastie, and [45] V. Godbole, G. E. Dahl, J. Gilmer, C. Shallue, and Z. Nado, “Deep
S. L. Delp, “Machine learning in human movement biomechanics: learning tuning playbook,” 2023. [Online]. Available: https://github.
Best practices, common pitfalls, and new opportunities,” Journal of com/google-research/tuning playbook
biomechanics, vol. 81, pp. 1–11, 2018.