0% found this document useful (0 votes)
1 views13 pages

08_machine learning software architecture and model workflow a case of Django REST Framework

This study investigates the challenges in Machine Learning (ML) software development and proposes a design architecture and workflow for successful deployment using Django REST Framework. Despite the potential of ML technology, over 80% of projects fail to reach production due to issues like lack of collaboration between data scientists and software engineers, and insufficient literature on ML software development. The research aims to identify these challenges and provide solutions to enhance the integration of ML models with web applications.

Uploaded by

iero nascimento
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views13 pages

08_machine learning software architecture and model workflow a case of Django REST Framework

This study investigates the challenges in Machine Learning (ML) software development and proposes a design architecture and workflow for successful deployment using Django REST Framework. Despite the potential of ML technology, over 80% of projects fail to reach production due to issues like lack of collaboration between data scientists and software engineers, and insufficient literature on ML software development. The research aims to identify these challenges and provide solutions to enhance the integration of ML models with web applications.

Uploaded by

iero nascimento
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

American Journal of Applied Sciences

Literature Reviews

Machine Learning Software Architecture and Model


Workflow. A Case of Django REST Framework
1
Kennedy Ochilo Hadullo and 2Daniel Makini Getuno
1Institute of Computing and Informatics, Technical University of Mombasa, Mombasa, Kenya
2Department of E-learning, School of Education, Egerton University, Njoro, Kenya

Article history Abstract: The purpose of this study was to find out the challenges facing
Received: 15-02-2021 Machine Learning (ML) software development and create a design
Revised: 24-05-2021 architecture and a workflow for successful deployment. Despite the promise
Accepted: 04-06-2021 in ML technology, more than 80% of ML software projects never make it to
Corresponding Author:
production. As a result, majority of companies around the world with
Kennedy Ochilo Hadullo investments in ML software are making significant losses. Current studies show
Institute of Computing and that data scientists and software engineers are concerned by the challenges
Informatics, Technical involved in these systems such as: Limited qualified and experienced ML
University of Mombasa, software experts, lack of collaboration between experts from the two
Mombasa, Kenya domains, lack of published literature in ML software development using
Email: [email protected] established platforms such as Django Rest Framework, as well as
existence of cloud software tools that are difficult use. Several attempts
have been made to address these issues such as: Coming up with new
software models and architectures, frameworks and design patterns.
However, with the lack of a clear breakthrough in overcoming the
challenges, this study proposes to investigate further into the conundrum
with the view of proposing an ML software design architecture and a
development workflow. In the end, the study gives a conclusion on how
the remedies provided helps to meet the objectives of study.

Keywords: Machine Learning, Data Science, Software Engineering,


Development, Deployment, Django REST Framework, Architecture, Workflow

Introduction The AI domain consists of several subfields, such as


Machine Learning (ML), Deep Learning (DL), natural
Artificial Intelligence (AI) has become an important area language processing, image processing and data mining
of research in the 21st Century in many fields including: which are also important topics in computing research and
Marketing, education, banking, finance, agriculture, technology industries (Zhang and Tsai, 2005; Zhang et al.,
healthcare, space exploration, autonomous vehicles, law 2019). ML is an application of AI that provides systems with
and so forth (Keshari, 2020; Hull, 2020). Besides, AI the ability to automatically learn and improve from
has also long been a major focus for tech leaders such experience without being explicitly programmed.
as: Facebook, Amazon, Microsoft, Google and Apple Despite the interest caused by ML due to its wide
(FAMGA) who have all been aggressively acquiring AI applications and benefits in computing technology, DL,
startups by trying to integrate machine learning into a subfield of machine learning is attracting much
their products and services (Pathak, 2017). attention as well. DL uses artificial neural networks to
It is noteworthy that FAMGA have announced mimic the workings of the human brain in processing
shifting from a mobile-first world to an AI-first world data and creating patterns for use in decision making.
(Allad, 2016). The shift implies that Information and However, despite the potential created by both ML and
Communication Technology (ICT) focus has moved DL in data science projects, there is evidence that
from optimizing user experience through mobile phone majority of the projects do not make it to production
inter faces to maximizing predictive accuracy through (Redapt Marketing, 2019; Ameisen, 2020) with a high
the use of AI. failure rate of approximately up to 90% being reported.

© 2021 Kennedy Ochilo Hadullo and Daniel Makini Getuno. This open access article is distributed under a Creative
Commons Attribution (CC-BY) 4.0 license.
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Motivation new software design patterns and new platforms for


development have emerged (Ameisen, 2020; Zhang and
Although significant strides have been made in the field Tsai, 2005; Zhang et al., 2019; Geron, 2019). However,
of ML software development, there is still a considerable these platforms both advantages and disadvantages.
amount of pitfalls that slow down the application of the The advantages include: Better data visualization,
technology in investments worldwide. Some of the issues scalability, pipelining and code debugging options. On the
that have motivated this study are stated below. flipside, the use of these tools requires fundamental
Most papers on ML published by students knowledge of advanced calculus and linear algebra along
undertaking masters and PhD studies in computer with a good understanding of web based software
science and Information Technology have been engineering in order to create a sustainable ML software.
implemented using Python programming language and Secondly, the field of data science is known to mainly
deployed using Tkinter (Grayson, 2000), graphical user focus on ML algorithm writing and model development
interface for python desktop application. using data mining software’s such as WEKA, Rapid
Secondly, the use of REST frameworks (Django or
Mining and Orange (Mikut and Reischl, 2011) and or ML
Flask) for Machine learning web applications is quite
programming languages such as python and R (Moroney,
complex and requires a good architecture and a clear
2020), with preferably labeled data, having minimal
workflow which are currently lacking (Jordon, 2019).
dimensionality and optimizing performance and accuracy
Thirdly, there is generally lack of clear software
of the model (Schröer et al., 2021).
engineering principles for successfully integrating ML
Another cause of concern in ML software development
models and web applications.
is that the principles used in software engineering and ML
Finally, the reportedly high failure rate of ML software
projects of up to 80% calls for further research into how modeling are quite divergent: While ML is concerned more
the design, development and deployment maybe be with algorithm writing, testing and accuracy issues, software
enhanced to improve the ML software engineering. engineering deals mainly with scalability, extensibility,
configuration, consistency, modularity and security
Background issues etc., (Sculley et al., 2015). It is thus difficult to
produce a software that seamlessly combines
Despite the perceived benefits of ML applications, the constraints from both domains.
process of developing, deploying and continuously Lastly, there is no clear formula or procedure on the
improving them is more complex compared to the integration of ML models with web applications created
traditional software, such as a web services and mobile with Django or Flask. This is to imply that while a
applications (Geron, 2019; Chen, 2015). Deployment, or majority of data scientists are good at creating ML models
simply, putting models into production implies making it using datamining tools, very few are good at creating the
available to others, whether that be users, management, or same models using languages such as R or Python.
other systems. When successfully deployed, ML projects The problem is further compounded by the need to
enables users to send data and get their predictions
design and develop a web application and merge it with
accurately via web or mobile interfaces.
an ML application as one application (Plonski, 2019;
During the development of an ML software, there
Bajpai, 2020).
are three major tasks undertaken by the developers: The
creation of ML model, the design of web application for
running the model and the successful deployment of the Purpose
product as an intelligent software (Chen et al., 2020; Li et al., The purpose of the study was to Identify the challenges
2015; Washizaki et al., 2019). These tasks are quite complex that hinder the Development and Deployment of Machine
and demanding and require the relevant skills and inputs of Learning Software Models and thereafter create a Software
from both ML Engineers and Software Engineers. Architecture and a Deployment Workflow implementable
Task one requires a thorough knowledge of ML using Pythons Django Rest Framework (DRF).
modeling using a machine learning programing
language such as python. Task two requires the Study Objectives
knowledge of web development using a REST
framework and the integration of the model with the Identify the challenges facing data scientists and
web application. Finally, task three involves successful software engineers during Machine Learning Software
deployment of the application with reliable outputs. Development and Deployment:
The challenges faced by ML engineers have resulted
into more research being conducted in this area with the i) Develop a suitable Machine Learning Software
view of alleviating the challenges mentioned. As a result, Architecture that is deployable with Python’s DRF

153
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

ii) Integrate a Machine Learning Software Deployment software deployment, machine learning engineering,
Workflow based on the software architecture created machine learning web applications, data science
in objective (i) engineering, machine learning software architecture,
machine learning software workflow, Django REST
In order to answer the study objectives, we propose to
framework and the challenges of deploying machine
come up with a Software Design Architecture (SDA) to
learning models.
better understand the basic structure of a ML software and
To capture as many relevant articles as possible, a
a Software Deployment Workflow (SDW) to guide the
range of journals, books and grey literature in the
development and deployment of ML software and help
mentioned areas were searched extensively to identify
overcome the challenges identified in the study.
whether they contained articles having these key words.
In total, twenty-five journals (25), sixteen books (16) and
Literature Review thirteen (13) grey literature were scoped. Out of these,
The study reviewed the literature relevant to the study only 18 journals, 15 books and 8 grey literature were
by using the Framework by proposed by Murad (2020) found to be relevant for review
and illustrated in Fig. 1. By applying this framework, Some of journals included were: Journal of Systems
we decided to use a systematic literature review and and Software, SSRN Electronic Journal, International
scoped the existing literature on ML software to help Journal for Research in Applied Science and Engineering
us define the Research Problem (RP). Once this was Technology, Journal of Data Warehousing and Journal of
done, the RP was specified in a clear and structured Systems, Software and Willy online Library. The review
manner by framing it using specific keywords. enabled us to identify some of the processes, models,
Some of the keywords used included machine frameworks and related work within the scope of the study
learning software development, machine learning topic as described in the next sections.

Decide on type Scope the existing Define study


Start purpose
of review literature

Select your
resources:
databases and
grey literature
Submit manuscript Stop
for publication

Choose your
search terms:
Keywords and
Write up your subject heading
findings

Test
Make a note of results:
Title and abstract how many results have you
screening are found from found all
Yes the No
each resource
records?

Fig. 1: Literature review flowchart (Murad, 2020)

154
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Machine Learning as a Model (MLaaM) and advanced ML algorithms (Geron, 2019; Zhang and
Tsai, 2005; Zhang et al., 2019; Singh, 2021).
MLaaM is the output of writing ML algorithms that MLaaS providers normally guarantee to their clients
run on data and represents what was learned by the all stages of the machine learning process, including data
algorithm on training data. An algorithm in ML is a storage and management, model development and
procedure that is run on data to create a machine learning deployment, performance monitoring and support and
model. Examples of ML algorithms include: K-nearest ensuring maximum efficiency of the whole machine learning
neighbors for classification, linear regression for process (Zhang and Tsai, 2005; Zhang et al., 2019).
regression and k-means for clustering (McClendon and Different providers may vary slightly in their cloud
Meghanathan, 2015). services, however most of them offer environments that
The model is a file that is saved after running the can be used to: Prepare data, train, test, deploy and
algorithm and represents the data, the rules and the provide performance monitoring. Some of the popular
procedures for using the data to make a prediction (Geron, providers include Amazon Web Services (Bankar,
2019). The most popular programming language for 2018), Google (Sanderson, 2012), IBM (Miller, 2019),
MLaaM is Python while Tensor Flow (TS) is the most Microsoft Azure (Ranjeetsingh, 2014) and Uber
preferred software framework by developers for both DL (Oppegaard, 2021).
and ML (Jaxenter, 2018).
ML models can be created using three techniques: ML Model Software Deployment
Supervised learning, unsupervised learning and Software deployment is all of the activities that
reinforced learning. Supervised learning algorithms make a software system available for use. It is the
which are the most common are trained using labeled
mechanism through which applications modules are
examples, such as an input where the desired output is
delivered from developers to users. The methods used
known, while unsupervised learning is used against
by developers to build, test and deploy new code will
data that has no historical labels (Sharma, 2020).
impact how fast a product can respond to changes in
customer preferences or requirements and the quality
Machine Learning as a Service(MLaaS) of each change (Fitzgerald and Stol, 2017).
Machine learning as a service (MLaaS) refers to a In the context of ML, the process of taking a trained
number of services that offer machine learning tools as model and making its predictions available to users is
part of cloud computing services (Singh, 2021; Geron, known as deployment. As such, ML deployment is not
2019). The main benefits of these tools is that very well understood amongst data scientists who lack
customers can get started with machine learning backgrounds in software engineering. Alternatively,
applications quickly without installing specific most software engineers are not good in ML model
software or provisioning their own servers. MLaaS development. Plonski (2019) highlighted the four
providers offer services for the development and methods of deployment, outlining the requirements,
deployment of ML software projects that allow: Data merits and the demerits of each. The methods are
transformation, predictive analytics, data visualization summarized in Table 1.

Table 1: Different ways of deploying ML models. Adopted from Plonski (2019)


SN Deployment Method Requirements Advantage Disadvantages Comment
1 Locally -Jupiter notebook Simple to implement Hard to govern, monitor, Not recommended
(Laptop or computer) - R studio or Predictions on ML code scale and collaborate for production
-Weka
2 Hard-code the ML algorithm -Jupiter notebook Can be used with simple ML Hard to govern, monitor, Not recommended
in the system's code - R studio or algorithms, like decision scale and collaborate for production
-Weka Trees or linear regression
-Software program
3 Use of REST API -Jupiter notebook, All requirements for the Requires data scientist Recommended
or Web Sockets -R studio or ML production system and software engineer for production
-Weka. can be fulfilled.
-Software framework
4 Use of a commercial cloud Colab PDE or jupiter All requirements for the ML Requires data scientist Recommended
vendor for Deployment notebook on laptop production system can be fulfilled and Software Engineer for production

155
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Django REST Framework (DRF) for deployment by putting the trained model in a code
repository (Singh, 2021). The software engineers deploy the
Django Representational State Transfer (REST) model as a prediction service using a micro service
Framework is a free and open source high-level Python architecture with REST APIs. The workflow of this process
web framework that encourages rapid development and is illustrated in Fig. 2.
clean, pragmatic design. DRF is a powerful and flexible
toolkit used for rapidly building web applications based Related Work
on Django database models (Jordon, 2019; Bajpai, 2020)
with the following advantages: Secure, scalable, A study by Runyu (2020) to create a design pattern for
customizable application with serialization that supports ML deployment ascertained that although data scientists
both the Object Relational Mapping (ORM) and non- have come up with many good algorithms and trained
ORM data sources (Jordon, 2019; Bajpai, 2020). models, putting those models is still a challenge. The key
Given that most ML models are created using Python obstacles hindering ML software production are: Lack of
programing language makes DRF a preferred platform for a clear methodology for moving ML models to
ML software development. production, use of monolithic programming or lack of
modularization when writing ML code and obscure best
ML Model Software Architecture (MMSA) practices in ML software development.
Runyu (2020) developed a system design pattern
The ML software application building process is a named Model-Service-Client + Retraining (MSC/R) in
complex process that brings together several components order to overcome these challenges (Fig. 3). This design
constituting the software engineering life cycle: Requirement pattern incorporates the principles of modularization and
engineering, analysis, design, development, testing separation of concerns and uses a micro service RESTful
deployment and maintenance (McGovern et al., 2004).
API architecture. Figure 3 illustrates the architecture.
Thus, there is need for a software architecture that
The MSC/R design pattern works by using three teams
supports the ML model component and the web
of distinct developers: Data scientists-working on the
application components and without negatively affecting
the performance of the software (Binge, 2020). model, MLOps engineers-working on the service and
IEEE CS (2000) defines Software Architecture (SA), client developers-working on the front end. Then the next
SA as the fundamental organization of a software important part of the design illustrates connectors linking
embodied in its components, their relationships to each the four main system components: Model, service,
other and the principles guiding its design and evolution. retraining and client. The connectors main function is to
The SA for this study will consist of the following provide guidelines for collaborations between the system
components: The architectural pattern which defines the components during development.
granularity of a component, system Interaction which defines In a related study by O’Leary and Uchida (2020) to
how the components communicate with each other and identify the common problems with creating ML
software quality attributes such as: Scalability, extensibility, pipelines from existing code, data was collected via face
maintainability, portability, adaptability and resilience, etc. to face meetings in coding workshop settings averaging
However, it is important to note that the type of 100 companies, data scientists, researchers, ML platform
architecture used in a software is normally determined by owners and software engineers. The companies
the project objectives, the proposed budget, the developer interviewed were in the process of transforming their
team skillset, infrastructure limits and the stakeholders business through the use of ML.
interest (Binge, 2020). The projects involved migrating existing ML models
to MLaaS using Kube Flow Pipelines (KFP) and Tensor
Machine Learning Operations Flow Extended (TFX). The study identified three problems:
Firstly, due to the highly iterative nature of ML model
Machine Learning Operations or “MLOps” is defined
development, the coding does not usually follow object
as the practice for collaboration between data scientists
oriented principles such as modularization and code re-
and software engineers in automatically managing the
deployment of ML and DL software lifecycles (Wang, use making it unsuitable for deployment using software
2019). MLOps can be manual or automatic engineering principles. As a result, engineers often
The manual MLOps processes as illustrated by Fig. 2 is need to re-implement the model from scratch into a
an entirely manual process that includes data analysis, data deployable software. During the re-implementation,
preparation, model training and validation in Jupiter many of the implicit assumptions made by data
Notebook by data scientists. The data scientists hand over a scientists for modeling get lost, resulting in unexpected
trained model as an artifact to the software engineering team inconsistencies and issues in production.

156
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Secondly, most ML model developments use and infrastructure. Repayment can be done via:
“monolithic programming approaches” i.e., building Automation, re-writing, refactoring, re-engineering, re-
applications that are “single-tiered” in nature. Single-tier packaging, bug fixing and improving documentation.
architecture when used in ML combines data with Repayment results into an improved software quality. ML
business logic and user interface codes in a single logical systems have a tendency for incurring technical debts
structure. This results into a tightly coupled application because of the already stated problems related to the
that becomes inefficient to run and difficult to maintain. domains of ML and software engineering.
In a related study by Sculley et al. (2015) that set Another study that set out to identify the challenges in
out to explore the several specific design risk factors to
deploying DL software by Chen et al. (2020), proposed
account for in ML software deployment, the output was
the Technical Debt Framework (TDF) illustrated in an ML deployment process consisting of four phases: DL
Fig. 4. Technical debt is an analogy used to describe a data collection, DL model training, Model conversion and
situation in software development where a workaround exportation to TS and Platform configuration and
is used to solve a software problem (Kruchten et al., deployment (Fig. 5).
2012; Zazworka et al., 2011). Several technical The DDDM has two facets: DL software
problems (debts) and potential workarounds development and DL software Deployment. The first
(repayment approaches) were identified and used to facet makes use of TF and Keras to integrate models
create the TDF (Fig. 4). into software applications for real usage after
Default in payment of technical debts may hinder validation and testing. The second facet involves
successful deployment. The debts include issues related deploying the model on a cloud-based server platform
to: Design, coding, testing, documentation, versioning such as AWS Sage Maker or Google Cloud.

Offline
Data
ML Ops

Data Model
Extraction Data Model Evaluation Trained Model
and Preparation Training and Model Serving
Analysis Validation

Experimentation/Development/Test
Model
Staging/Preproduction/Production Registry

Prediction
Service

Fig. 2: Manual MLOps deployment process Singh (2020)

User User User

Data Scientists MLOps Engineers Client Developers

MS SC
Model Connector
Service Connector
Client

Retraining

Fig. 3: The Model-Service-Client + Retraining (MSC/R) design pattern. Source (Adopted from Runyu, 2020)

157
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Technical Debts and Descriptions


Requirement Not Requirements
specified TD
Violations of Good Architectural Quality
Architectural Practice TD Softawere
Incomplete Design Design TD Repayment
Speicification TD Approaches Efficency

Incomplete Design Design Refactoring


Speicification TD Functinality
Re-writing
Low Quality Code,
Code
Duplicate code
TD Automation Usability
and coding Violations

Differing testing, Lack of Re-engineering


Tests, Lack of Test
Test
TD Security
automation & Re-packaging
Differing testing, Lack of
Tests, Lack of Test
Build Bug Fixing Reliabilty
automation & TD
Fault Tolerance
Incomplete & insufficient Documentation Maintanability
Documentation, TD Documentation
Old infrastructure, lack of
Infrastructure Compatibility
integration & lack of
automated deployment TD
Unnecessary code forks, Versioning
multi version support TD

Defects and Bugs


Defect
TD

Fig. 4: The technical debt framework source: Adopted from Li et al. (2015)

The deployment challenges identified include: and MYSQL as the database server. The front layer of the
Converting models to platform formats, configuration application was built using HTML CSS and JavaScript.
errors encountered during integration, limited skills in ML The study used Automated testing approaches to ensure
software development and data processing challenges the following: Making sure the application is working as
when converting raw data into the input format needed by expected before deployment, ensuring that new
the model software. To obtain the data relevant for the functionalities do not change the behavior of application in
study, over 3,023 posts from (Stack Overflow, 2020), unexpected way, finding and fixing bugs and testing the
specifically from TS serving, Google cloud ML and performance of the application under heavy loads.
Amazon sageMaker were collected and analyzed. Washizaki et al. (2019) embarked on a study with
In another related study, Esmaeilzadeh (2017), the purpose of collecting, classifying and discussing
designed an architecture and developed a testable,
the best practices for designing quality and complex
scalable and efficient web-based application that
models and implements machine learning applications ML systems (Fig. 7).
in cancer prediction. The main components that formed The study set out to collect good and bad design
the architecture of the system included a server, a patterns for ML software so as to provide developers
database, a programming language, Django web with a comprehensive classification of such patterns.
framework, front-end design, testability, scalability, By using a questionnaire-based survey, the study
performance and design pattern (Fig. 6). established that there is a lack of expertise by ML
The data set for the study’s application was a subset of engineers on the development of the architectures and
the Surveillance, Epidemiology and End Results (SEER) design patterns. The study formulated a design pattern
Program of the National Cancer Institute. The application based on the Model View Controller (MVC) pattern
was implemented with Python as the back-end having three layers: Presentation Layer, the Logic
programming language, Django as the web framework Layer and the Data Layer.

158
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Exported Model for


Server/Cloud Platform
Server Cloud

Trained DL DL Framework Converted Modes for


DL Data Mobile Platform
Models Mobile Interface
Collection TensorFlow
&
Keras
Exported Models for Converted Models for
Browser Browser

DL Software Development DL Software Deployment

Fig. 5: DL software Development and Deployment Model (DDDM). Source (Adopted form: Chen et al., 2020)

Deployment Model using Django ML Model for Cancer Prediction


Framework
Identification Definition
Data Pre- Algorithm
Template of Required of Training
Processing Selection
Data Set
(Display Model
Logic) (Object
Relational
Create, Mapping(ORM)
Update, MYSQL
Web or Delete
Mobile Data for
DB Python Training
Display Dataset Serialised
GUI Create,
s Object
Update,
Parameter
Model Delete Tuning
User
NO
View
(Business Logic) OK ?
Evaluation
Classifier With Test Set
YES

Fig. 6: ML software deployment architecture adopted from Esmaeilzadeh (2017)

Presentation Layer Logic Layer Data Layer


Business
Specific
Logic
User Business
Database
Real World

Interface Logic

Data Data
Data Lake
Machine
Learning
Specific

Collection Processing

Business Logic
Data Flow
Inference
ML Runtime Data Flow
Engine
Architectural Layers

Fig. 7: Software engineering design pattern ML software systems. Adopted from Washizaki et al. (2019)

159
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Table 2: Summary of literature review


SN Model/pattern/framework and author Advantages Disadvantages
1 Model-Service-Client + Retraining This design pattern incorporates the Does not clearly how the model, service and
(MSC/R) design pattern (Runyu, 2020) principles of modularization and separation client component are integrated
of concerns and uses a micro service
2 A test driven approach to develop Use of automated testing approaches to ensure No clear explanation how the rest of the
web-based machine learning the following: Making sure the application application was developed using Python,
applications (Esmaeilzadeh, 2017) is working as expected before deployment Django, MYSQL CSS and JavaScript
No mention of how deployment was done
3 DL software deployment Uses tensor flow and Keras to integrate No clear methodology on how Tensor Flow
model: Chen et al. (2020) models into software applications deploying and Keras was used
the model on a cloud based server platform No clarity on how deployment was done
such as AWS sage maker
3 Software engineering design pattern Exposed lack of expertise by ML engineers No clear methodology on how the logic, the
for designing machine learning on ML software development data and the presentation layers were created,
Systems (Washizaki et al., 2019) created an MVC for ML software integrated and deployed together with the ML model
4 The Technical Debt Framework (Adopted Identified some of the risks in ML software No mention of an architecture or a deployment
from (Li et al., 2015) deployment called technical debts workflow
Identified debt repayment approaches
5 Common problems with creating machine Used KubeFlow Pipelines (KFP) and Methodology on both development and
learning pipelines from existing code TensorFlow Extended (TFX) Deployment not clear
(O’Leary and Uchida, 2020)

Summary of Literature Review SA3: System Configuration Files


After a comprehensive literature review, the results are The configuration files such as the settings.py and
summarized based on the model or framework reviewed, urls.py are vital in linking the system files together. For
in terms of the advantages and disadvantages of each example, they are useful in creating paths and importing.
framework and model (Table 2). For example, they are useful in creating paths and
importing files, linking the static and template files, defining
Proposed ML Software Model Deployment database credentials and middleware components and
linking the installed apps and security key.
Architecture (DFMSA)
SA 4: Serialization/De-Serialization
The proposed architecture describes the major
components of both the ML model and the Django part, Object serialization is the process of saving a ML Model
their relationships (structures) and how they interact as a Pickle, a Joblib or manually saving and restoring using
with each other. This architecture is known as the a JSON approach. Serialization represents an object with a
Django Framework ML Software Architecture stream of bytes, in order to store it on disk, send it over a
(DFMSA). The DFMSA consists of six sub network or save to a database. Deserialization is the process
architectures (SAs): The user interface component, the of restoring and reloading the pickled ML Model back to the
Serialization/De-Serialization component, the Jupiter Notebooks (IPYNB) format.
server/repository component and the application
component, configuration files component and the SA 5: Server and Repository
command line utility component (Fig. 8). Heroku is a Cloud Platform as a Service (PaaS)
SA1: User Interface supporting several programming languages such as: Ruby,
Java, Node.js, Scala, Python and PHP. One advantage with
The user interface provides a connection between Heroku is that If the project is already pushed to GitHub,
the Admin and normal user with the system through the automatic deployments can easily be set from the project's
Admin Panel and the Client Interface. Beneath this SA repository in GitHub from the Heroku dashboard.
lies the static and template folders containing the CSS,
HTML, JavaScript and JSON files. The SA connects SA 6: Command Line Utility
with the rest of the application through the application
The command line utility contains two major utilities:
URLS file.
Manage.py, a command-line utility that lets you interact with
SA2: Django API this Django project in various ways and django-admin.py, a
The Django API is made up of the files: View.py for Django’s command-line utility for administrative tasks.
logic, models.py for database code, apps.py for Proposed ML Software Model Deployment
application configuration, urls.py for providing paths,
Workflow (SMDW)
admin.py for administrative functions and tests.py for
writing tests. All the files work in conjunction to make The proposed SMDW is arrived based on the proposed
the application accept user data and give predictions. architecture and Literature Review summary (Table 2).

160
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

From this table, there are six factors which we turned into six code into the GitHub Repository (GHR). This is in
phases: Start, build ML Model, Build Django App, integrate preparation for the software engineering part of the project.
Model into Application or App, Make Predictions and test
user response tests using two variants: Variant A vs variant Phase 2: Build ML Model
B, also known as A/B testing (Fig. 9).
During this phase, an ML engineer or a data
Phase 1: Start scientist installs Jupiter Notebook and installs and
During this phase, the software engineer is supposed to loads all the initial packages required for the project.
start by setting up a GitHub account, installing the Python This is followed by the loading and pre-processing of
virtual environment, creating a Django project and adding the data file, writing, training and saving the algorithms
applications files into the project followed by committing the before adding the code into the GHR.

Django REST Framework-API ML Software Architecture

Serialisation/De-serialization Server/Repository
IPYNB PKL ML Algorithm
Heroku Cloud
File Object Registry

User Interface Postgre


App-Folder
SQL

Deploy
Admin static
models.py admin.py forms.py
Panel Folder

Admin
apps.py tests.py migrations
Client templates
Side Folder
Git
views.py urls.py serialisers.py
User Repository

Configuration Files Command Line Utility


settings.py urls.py manage.py Django-admin.py

Fig. 8: Proposed system DFMSA using Plonski (2019)

Phase 4
Phase 2 Phase 3 Phase 5 Phase 6
Phase 1 Integrate
Build ML Build Make A/B
Start Model into
Model Django App Predictions Testing
App

Setup Jupiter Create


Set up Git NoteBook Define DB Write Server Add 2nd ML
models code for ML Views for Algorithm
Model Predictions
Install & Load
Install Virtual Packagaes Create REST Create DB
Environment APIs for DB model for
models Write Test Add Urls for
predictions Tests
codes for ML
Create Load & Pre-
Process Data Create REST
Django Add DRF API for Test
project Serilisers Create ML Write Tests
Algorithm for Info
Write & Train Registry Predictions
Create Apps Algorithms Write Scripts
into project Add views & for Sending
urls Add Requests
Add Code to Algorithms
Add Code to Git to Registry
Git Add Code to Add Code to
Add Code to Git Git
Git Add Code to
Git

Fig. 9: Proposed ML model deployment workflow Plonski (2019)

161
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Phase 3: Build Django App Acknowledgment


During this phase, the software engineer continues This research was made possible by the support provided
with what was started in Phase 1 by adding the database by The Technical university of Mombasa and Egerton
models, creating the REST APIs for the models, adding university through journal subscriptions, need-based
DRF serializes, adding views and URLS and adding the acquisition and a favorable research environment.
code into the GHR.
Phase 4: Integrate ML Model in Django App Author’s Contributions
During this phase, the software engineer continues Kennedy Ochilo Hadullo: Contributed mainly on
with what was done in Phase 3 by writing ML server the architecture design of the manuscript.
Daniel Makini Getuno: Contributed mainly on the
code for the model, write Test codes, creates a registry
workflow design of the manuscript.
and add algorithms into the registry and then add the
code into the GHR.
Ethics
Phase 5: Make Predictions
This article is original and contains unpublished
During this phase, the software engineer continues material. Both the corresponding author and the co-author
with what was done in Phase 4 by creating views for confirm that they have read and approved the manuscript
predictions, creating DB models for Tests, create REST and that no ethical issues are involved. The authors
APIS for Tests, write scripts for sending Requests and add declare that they have no competing interests.
the code into the GHR.
References
Phase 6: A/B Testing
Allad, R. (2016). Moving from a Mobile First to an AI
A/B testing in the context of this study is the process First World. https://unionstreetmedia.com/moving-
of comparing two outputs of the ML software predictions from-a-mobile-first-to-an-ai-first-world/
and concluding which of the two outputs or variants is Ameisen, E. (2020). Building machine learning powered
more effective or accurate. The other parts of the project applications: Going from idea to product. " O'Reilly
are repeated such as creating views for predictions, Media, Inc.".
creating DB models for Tests, creating REST APIS for Bajpai, S. (2020). Analyzing resume using natural
Tests, writing scripts for sending Requests and adding the language processing machine learning and django.
code into the GHR. International Journal for Research in Applied
Science and Engineering Technology, 8(5),
Conclusion and Recommendation 2037-2039. doi.org/10.22214/ijraset.2020.5333
Bankar, S. (2018). Cloud Computing Using Amazon Web
This study investigated challenges that hinder the Services AWS. International Journal of Trend in
Development and Deployment of ML software models in Scientific Research and Development, 2156-2157.
order to create an architecture and a deployment workflow doi.org/10.31142/ijtsrd14583
implementable using Pythons DRF. After a systematic Binge, S. (2020). The importance of good software
literature review, the main challenges were found to be: architecture. https://www.sitepen.com/blog/the-
Unethical programming practices, lack of software importance-of-good-software-architecture
development skills that integrate both data science and Chen, L. (2015). Continuous delivery: Huge benefits, but
software engineering, difficulty in using software’s and challenges too. IEEE Software, 32(2), 50-54.
tools for developing ML software and a lack of clear doi.org/10.1109/ms.2015.27
methodology for deployment. A suitable ML software Chen, Z., Cao, Y., Liu, Y., Wang, H., Xie, T., & Liu, X.
architecture and model workflow and are also (2020, November). A comprehensive study on
presented as a solution to deployment problems within challenges in deploying deep learning based
the ML engineering. This study aims to benefit ML software. In Proceedings of the 28th ACM Joint
software engineers in industry to help increase the rate Meeting on European Software Engineering
of production as well as masters and PhD students in Conference and Symposium on the Foundations of
IT and computer science to help them in wring their Software Engineering (pp. 750-762).
thesis regarding ML software. It is recommended that Esmaeilzadeh, A. (2017). A Test Driven Approach to
there I need to use the created architecture and Develop Web-Based Machine Learning
deployment workflow to try and deploy an ML Applications. UNLV Theses, Dissertations,
software as a test. Professional Papers and Capstones. 3127.

162
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Fitzgerald, B., & Stol, K. J. (2017). Continuous software Miller, J. (2019). Hands-on machine learning with IBM
engineering: A roadmap and agenda. Journal of Watson: Leverage IBM Watson to implement
Systems and Software, 123, 176-189. machine learning techniques and algorithms using
doi.org/10.1016/j.jss.2015.06.063 Python. Packt Publishing Ltd.
Grayson, J. E. (2000). Python and Tkinter https://www.amazon.com/Hands-Machine-
programming. Manning Publications Co. Learning-IBM-Watson/dp/1789611857
Greenwich. Moroney, L. (2020). Ai and machine learning for coders.
http://117.239.19.55:8080/jspui/handle/12345678 O'Reilly Media, Incorporated.
9/230 https://www.oreilly.com/library/view/ai-and-
Geron, A. (2019). Hands-on machine learning with Scikit- machine/9781492078180/
Learn, Keras and TensorFlow: Concepts, tools and Murad, D. F. (2020). Systematic Literature Review (SLR)
techniques to build intelligent systems. O'Reilly Approach. doi.org/10.31219/osf.io/v7239.
Media. ISBN-10:1492032611. O'Leary, K., & Uchida, M. (2020). Common Problems
Hull, J. C. (2020). Machine Learning in Business: An with Creating Machine Learning Pipelines from
Introduction to the World of Data Science. Existing Code. https://storage.googleapis.com/pub-
Independently Published. tools-public-publication-
https://www.amazon.com/Machine-Learning- data/pdf/b50bc83882bbd29c50250d1e59fbc3afda3f
Business-Introduction-Science/dp/B088B8162S b5e5.pdf
IEEE CS. (2000). Recommended Practice for Architectural Oppegaard, S. M. N. (2021). Regulating Flexibility:
Description for Software-Intensive Systems. Uber’s Platform as a Technological Work
doi.org/10.1109/ieeestd.2000.91944 Arrangement. Nordic Journal of Working Life
Jaxenter. (2018). ML trends in stack overflow developer Studies. doi.org/10.18291/njwls.122197
survey 2018. https://jaxenter.com/ml-trends-stack- Pathak, N. (2017). Artificial Intelligence for. NET:
overflow-145870.html Speech, language and search: Building smart
Keshari, K. (2020, December 02). Top 10 applications of
Applications with Microsoft Cognitive Services
machine learning: Machine learning applications in
APIs. Apress. ISBN-10: 1484229495.
daily life. https://www.edureka.co/blog/machine-
Plonski, P. (2019). December 31. Deploy Machine
learning-applications
Learning Models with Django.
Jordon, W. (2019). Python django web development: The https://www.deploymachinelearning.com/
ultimate django web framework guide for Beginners. Ranjeetsingh, S. S. (2014). Microsoft windows azure:
Independently Published. Developing applications for highly available Storage
https://www.amazon.com/Python-Django-Web- of cloud service. International Journal of Science
Development-framework/dp/1688542817 and Research (IJSR), 4(12), 662-665.
Kruchten, P., Nord, R. L., Ozkaya, I., & Visser, J. (2012). doi.org/10.21275/v4i12.nov151864
Technical debt in software development. ACM Redapt Marketing. (2019). Why do ML projects fail?
SIGSOFT Software Engineering Notes, 37(5), 36-38. https://www.redapt.com/blog/why-90-of-machine-
doi.org/10.1145/2347696.2347698 learning-models-never-make-it-to-
Li, Z., Avgeriou, P., & Liang, P. (2015). A systematic production#:~:text=During%20a%20panel%20at%20l
mapping study on technical debt and its management. ast,actually%20make%20it%20into%20production
Journal of Systems and Software, 101, 193-220. Runyu, Xu. (2020). A design pattern for deploying
doi.org/10.1016/j.jss.2014.12.027 machine learning models to production.
McClendon, L., & Meghanathan, N. (2015). Using https://csusm-
Machine Learning Algorithms to Analyze Crime dspace.calstate.edu/bitstream/handle/10211.3/21717
Data. Machine Learning and Applications: An 6/XuRunyu_Summer2020.pdf?sequence=1
International Journal, 2(1), 1-12. Sanderson, D. (2012). Programming Google App Engine.
doi.org/10.5121/mlaij.2015.2101 Sebastopol, CA: O'Reilly.
McGovern, J., Ambler, S. W., Stevens, M. E., Linn, J., Jo, https://www.oreilly.com/library/view/programming-
E. K., & Sharan, V. (2004). A practical guide to google-app/9781449314095/
enterprise architecture. Prentice Hall Professional. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips,
ISBN-10: 0131412752. T., Ebner, D., ... & Dennison, D. (2015). Hidden
Mikut, R., & Reischl, M. (2011). Data mining tools. Wiley technical debt in machine learning systems. Advances
Interdisciplinary Reviews: Data Mining and in Neural Information Processing Systems, 28,
Knowledge Discovery, 1(5), 431-443. 2503-2511. http://papers.nips.cc/paper/5656-
doi.org/10.1002/widm.24 hidden-technical-debt-in-machine-learning-systems

163
Kennedy Ochilo Hadullo and Daniel Makini Getuno / American Journal of Applied Sciences 2021, Volume 18: 152.164
DOI: 10.3844/ajassp.2021.152.164

Sharma, R. (2020). Study of supervised learning and Washizaki, H., Uchida, H., Khomh, F., & Guéhéneuc, Y. G.
unsupervised learning. International Journal for (2019, December). Studying software engineering
Research in Applied Science and Engineering patterns for designing machine learning systems.
Technology, 8(6), 588-593.
In 2019 10th International Workshop on Empirical
doi.org/10.22214/ijraset.2020.6095
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A Systematic Software Engineering in Practice (IWESEP)
Literature Review on Applying CRISP-DM Process (pp. 49-495). IEEE.
Model. Procedia Computer Science, 181, 526-534. https://ieeexplore.ieee.org/abstract/document/8945075/
https://doi.org/10.1016/j.procs.2021.01.199 Zazworka, N., Shaw, M. A., Shull, F., & Seaman, C.
Singh, P. (2021). Deploy machine learning models to (2011, May). Investigating the impact of design debt
production: With flask, streamlit, docker and on software quality. In Proceedings of the 2nd
kubernetes on google cloud platform. Apress.
Workshop on Managing Technical Debt (pp. 17-23).
http://103.7.177.7/handle/123456789/207519
Stack Overflow. (2020). We <3 people who code. doi.org/10.1145/1985362.1985366
https://stackoverflow.com/never make it into Zhang, D., & Tsai, J. J. (Eds.). (2005). Machine
production? learning applications in software engineering
https://venturebeat.com/2019/07/19/why-do-87-of- (Vol. 16). World Scientific.
data-science-projects-never-make-it-into- doi.org/10.1142/9789812569271_0001
production/ Zhang, T., Gao, C., Ma, L., Lyu, M., & Kim, M. (2019,
Wang, Q. (2019). Machine learning applications in October). An empirical study of common
operations management and digital marketing challenges in developing deep learning
(Doctoral dissertation, Universiteit van Amsterdam).
applications. In 2019 IEEE 30 th International
https://abs.uva.nl/binaries/content/assets/subsites/am
sterdam-business- Symposium on Software Reliability Engineering
school/research/dissertations/thesis-q.-wang---abs- (ISSRE) (pp. 104-115). IEEE.
2019.pdf doi.org/10.1109/issre.2019.00020

164

You might also like