DSA unit 5
DSA unit 5
13
Case Studies in Data Optimization Using Python
Jahangir Alam
AMU, India
CONTENTS
13.1 Introduction ������������������������������������������������������������������������������������������������������������������������ 255
13.2 Optimization and Data Science����������������������������������������������������������������������������������������258
13.3 Literature Review���������������������������������������������������������������������������������������������������������������259
13.4 Taxonomy of Tools Available for Optimization������������������������������������������������������������� 260
13.4.1 Modeling Tools ������������������������������������������������������������������������������������������������������� 261
13.4.2 Solving Tools����������������������������������������������������������������������������������������������������������� 261
13.4.3 Justification for Selecting OR-Tools ��������������������������������������������������������������������� 261
13.5 Case Studies Python Prerequisites ���������������������������������������������������������������������������������� 264
13.6 Case Studies: Solving Optimization Problems Through Python �������������������������������� 264
13.6.1 Case Study 1: Product Allocation Problem ������������������������������������������������������� 265
13.6.2 Case Study 2: The Transportation Problem ������������������������������������������������������� 268
13.6.3 Case Study 3: The Assignment Problem ������������������������������������������������������������ 270
13.7 Conclusions������������������������������������������������������������������������������������������������������������������������� 274
References �������������������������������������������������������������������������������������������������������������������������������������� 275
13.1 Introduction
Statistics, probability, and linear algebra topics that are recommended to any newcomer to
learn in the field of data science and machine learning (ML). High-performance computing
(HPC), machine learning, data science, and big data are buzz words these days. In order to
provide companies with a competitive advantage in the modern virtual world, data scien-
tists are discovering new ways to exploit leverage of the big data available to them. These
scientists are generally equipped with combination of skills, which include programming,
soft skills, and analytics (optimization, machine learning, and statistical techniques). It
would not be out of context to mention here that in the present digital world companies
regard data scientists as their wildcards or maybe gold miners who dig for chunks of gold
underground.
For a successful career in these fields, the value of strong basis in these topics is beyond
argument. However, the topic of data optimization, while undermined, is also equally
important to everyone willing to pursue a successful career in these fields. The importance
255
of optimization as an essential step in every major social, economic, business, and personal
decision that is taken by a group of individuals, an individual, software personal decision
agents, and intelligent machines, can’t be underestimated.
The ingredients of AI, big data, and machine learning algorithms are incomplete with-
out optimization. The optimization process starts with formulating a cost function and
finishes with maximizing or minimizing the formulated function using one or another
optimization procedure under given constraints. What affects the accuracy of the results is
the selection of appropriate optimization procedure. The application area of optimization
is too wide, and it is always difficult to find a real-life situation where it can’t be applied.
Due to such a broad application area, optimization has been widely researched in aca-
demia as well as industry.
An optimization problem is defined as a problem where we maximize or minimize a
real valued function by carefully choosing input values from an allowed set of values and
compute the values of the real valued function. It means that when we consider optimiza-
tion, we always strive to find the best solution. Optimization is an essential step in model-
ing and problem solving related to AI and allied fields like machine learning and data
science. A large number of data science and machine learning problems ultimately con-
verge to optimization problems. As an example, consider the approach of a data analyst
who solves a machine learning problem for a large dataset. First of all, the analyst expresses
the problem using a suitable group of prototypes (called models) and transforms the infor-
mation into a format acceptable by the chosen group. The next step is to train the model.
This is done by optimizing the variables of the prototype with regard to the selected regu-
larization function or loss function using a core optimization problem. The process of
selecting and validating the model requires the core optimization problem to be solved
several times. Through these core optimization problems, the research related to the field
of machine learning, data science, and mathematical programming is related to each other.
At one end, mathematical programming provides the definition of optimality conditions,
which guide the analyst to decide what constitutes an optimal solution, and at the other
end, algorithms of mathematical programming enable data analysts with procedures
required to train large groups of models.
The general form an optimization problem is:
min f z
z
Subject to : g i z 0 i 1, 2, 3 , n
hj 0 j 1, 2, 3 , p
z
Where:
• n ≥ 0, p ≥ 0.
• The problem is referred to as unconditional optimization problem if n = p = 0.
• The above general form defines what is known as a minimization optimization prob-
lem. There also exists a maximization optimization problem, which could be under-
stood by negation of the objective function. Solution of an optimization problem
refers to determine z ∈ ∆, to minimize/maximize the objective function f subject to
inequality constraints gi(z) ≤ 0 and equality constraints hj = 0.
Any problem that starts with the question “What is best?” can almost always be formu-
lated as an optimization problem (Boyd and Vandenberghe, 2004). For example:
To help formulate the solutions to such problems, researchers have defined a framework
into which the solvers fit the questions. This framework is referred to as a model. The ulti-
mate feature of a model is that it has constraints and a function referred to as objective,
which must be achieved under the given constraints. In other words, the constraints are
obstacles in the way of achieving the objective. If a solver is capable to clearly state the
constraints and the objective function, he is nearer to a model. Figure 13.1 illustrates the
solution procedure for an optimization problem.
There are different classes of optimization problems. For a particular class of optimiza-
tion problem, the solution procedure refers to an algorithm that leads to a solution (up to
some desired accuracy) for a given problem (an instance of the class) of that class. Efforts
on developing viable algorithms for various classes of optimization problems, developing
S/W packages to solve them, and analyzing their properties are being put since late 1940s.
The viability of various algorithms significantly depends on factors like the particular form
of constraints and objective functions, number of constraints and variables, and special
features such as sparsity. If each constraint function of the problem depends on only a
small portion of the variables, the problem is referred to as sparse. This is unexpectedly
hard to solve the generalized optimization problem when constraint and objective func-
tions are smooth (e.g., polynomials) (Boyd and Vandenberghe, 2004). So it is imperative
that attempts to solve the general optimization problem include certain kinds of compro-
mise, such as not finding the exact solution, long execution time, and so on. There are,
however, few exceptions to this general rule. Efficient algorithms do exist for a particular
class of problems that can solve sufficiently large problems with thousands of constraints
and variables reliably. Linear programming and least squares problem belong to those
classes of optimization problems. Convex optimization is also an exception to the general
FIGURE 13.1
Steps to Solve an Optimization Problem
rule of solving optimization problems (Agrawal et al., 2018). Efficient algorithms do exist
for convex optimization problems that can reliably solve the large optimization problems
efficiently. General non-convex optimization problems are proved to be NP-hard (Boyd
and Vandenberghe, 2004).
In the optimization field, the emphasis is switching to data driven optimization from
model based optimization. In this approach data is the main source around which the opti-
mization problem is formulated. The size of the optimization problem is becoming large as
the size of data to be processed through the problem is also large. This leads to significant
increase in solution time and complexity of the problem. In other words, we can say that
large optimization problems, including big datasets, require large interaction with a human
solver to find a feasible solution and require a longer solution time.
The rest of the chapter is organized as follows. Section 13.2 discusses optimization in the
context of data science and machine learning. Section 13.3 presents a review on related
research. Section 13.4 presents various tools that are used to solve the optimization prob-
lems and justifies author’s choice for a specific tools (Google OR-Tool). Section 13.5 pres-
ents Python’s prerequisites for running the models formulated in this chapter. Section 13.6
presents detailed case studies along with modeling process, code, and results. Section 13.7
concludes the chapter.
• Rationalize the working of the algorithm. That means if we get a result that we want
to interpret and we had a deep understanding of optimization, we will be able to real-
ize why we got the result.
• And at an even higher level of understanding, we might be able to develop new algo-
rithms ourselves.
FIGURE 13.2
Taxonomy of Optimization Tools
vocabulary, formal constructions, and grammar for specifying the models, the solving lan-
guages can take as input the models programmed in certain modeling languages and pro-
vide the solution.
This section briefly introduces some prominent tools under each category and justifies
why we have chosen Python-based approach over the other approaches available for mod-
eling and solving the optimization problems.
TABLE 13.1
Summary of Various Modeling Tools
S.No. Tool Description
1. AMPL To promote rapid development and reliable results, the AMPL supports the entire life
cycle of optimization modeling (AMPL, 2020). It supports a high-level algebraic
representation of optimization models, which is close to the ways people think about
the models. It provides special tools for modeling large scale optimization problems. A
command line language for analyzing and debugging of models and a debugging tool
for manipulation optimization strategies and data are also provided with the system.
AMPL’s APIs for C, C++, MATLAB, JAVA, R, Python, and so on to ensure easy
integration.
2. GAMS GAMS is a high-level language that supports both optimization and mathematical
programming (GASM, 2020). It provides a language compiler for analyzing and
debugging the models and several associated solvers. Real-world optimization
problems can quickly be transformed into computer code using GAMS modeling
language. Its compiler puts the model in a format that can easily be understood by
associated solvers. As many solvers are supported by GAMS format, it provides users
the flexibility of testing his model on various solvers.
3. Minzinc Minzinc is an open-source and free framework for modeling of the constraint
optimization problems (The MiniZinc, 2020). It could be used on model constraint
optimization problems in a solver independent high-level language. This is done by
taking benefit of a large library of predefined constraints. Models formulated with
Minzinc are compiled into another high-level language referred to as FlatZinc. FlatZinc
is a solver input language and is understood by a large number of solvers.
4. GMPL It stands for GNU mathematical programming language. GMPL is a modeling language
intended for describing mathematical programming models (GLPK, 2020). To develop
a model in GMPL, a high-level language is provided to the user. The model consists of
data blocks and a set of statements defined by the user. A program referred to as model
translator analyzes the user-defined model and translates it into internal data
structures. This process is referred to as translation. The translated model is submitted
to the appropriate solver for getting the solution of the problem.
5. ZIMPL ZIMPL is a relatively small language (ZIMPL, 2020). ZIMPLS facilitates to formulate the
mathematical model of a problem into a (mixed-)integer mathematical or linear
program. The output is generated in .mps or .ls file format that can be understood and
answered by a MIP or LP solver.
6. OPL Optimization Programming Language (OPL) is an algebraic modeling language. It
facilitates an easier and shorter coding mechanism compared to a general-purpose
programming language (The IBM ILOG, 2020). A part of the CPLEX (IBM, 2020)
software package, it is well supported by IBM through its ILOG CPLEX and ILOG
CPLEX-CP optimizers. OPL supports integer/(mixed)-integer, constraint and liner
programming.
compatible the other doesn’t have support. These constraints make the use of modeler and
solver limited. Another aspect of this incompatibility is that a large number of modelers
have support for some mathematical optimization problems. From his past experience, the
author has learnt that use of specialized modeler and solver languages should be avoided
and one must use a high-level language, e.g., C, C++, Python, R, interfaced with a library
that supports multiple solvers. Google’s Operation Research Tools (OR-Tools) come into
picture to support this idea. It is a well-structured, comprehensive library that offers a
user-friendly interface. It effectively supports constraint programming and has special
routines for network flow problems. In this chapter the author will demonstrate only a
very small portion of this encyclopedia of optimization.
TABLE 13.2
Summary of Solving Tools
S.No. Tool Description
FIGURE 13.3
Parser Between Modeler and Solver
OR-Tools have been awarded four gold medals in the 2019 MiniZinc Challenge, the
international constraint programming competition (OR-Tools, 2020). Some other impor-
tant features of OR-Tools are listed below (Bodnia, 2020):
Looking at the above properties of Google’s OR-Tools the author has selected them for the
proposed case studies.
TABLE 13.3
Python Commands to Install Required Packages
Step 1: Upgrade pip
Launch the Window’s command prompt as an administrator and to upgrade pip type in the
following:
python –m pip install - -upgrade pip
Step 2: Install Packages
To install any package using pip, launch the Window’s command prompt as an administrator and
use the following command (general form):
python –m pip install package_name
Step 3: OR-Tools Installation
The command to install OR-Tools is as follows:
python –m pip install ortools
OR
!pip install ortools
(For users working with Jupyter Notebook or Google Colab platform)
Note: For the purpose of installing packages two options, namely conda and pip, are there. These case studies
use Python Packaging Authority’s (PPA) recommended tool pip for installing packages from the Python Package
Index (PyPI). With pip, Python software package are installed as wheels or source distributions. pip is already
installed with all versions of Python after 2.7.9.
can be formulated and solved (not always) by applying simple liner algebraic techniques.
In Case Study 1, the author considers one such problem and shows how to model and
solve the problem.
13.6.1 Case Study 1: Product Allocation Problem (Swarup, Gupta, and Mohan, 2009)
An electronics company has three operational subdivisions—Fabrication, Testing, and Packing,
with a capacity to produce three different types of components, namely E1, E2, and E3, yielding a
profit of Rs. 4, Rs. 3 and Rs. 5 per component. Component E1 require 4 minutes in fabrication, 4
minutes in teasing, and 12 minutes in packing. Similarly, component E2 requires 12 minutes in
fabrication, 4 minutes in testing, and 4 minutes in packing. Product E3 requires 8 minutes in each
subdivision. In a week, total run time of each subdivision is 90, 60, and 100 hours for fabrication,
testing, and packing respectively. The goal is to model the problem and find the product mix to maxi-
mize the profit.
TABLE 13.4
Product Allocation Problem Data
Subdivisions
Fabrication Testing Packing Profit/ Component
(in minutes) (in minutes) (in minutes) (In. INR)
E1 4 4 12 4
E2 12 4 4 3
E3 8 8 8 5
Availability (minutes) 90 × 60 60 × 60 100 × 60
a ≥ 0, b ≥ 0, and c ≥ 0 (13.1)
Step 3: The constraints are the limiting weekly working hour of each subdivision.
Production of one unit of component E1 requires 4 minutes in fabrication. The quantity
being a units, the requirement for fabrication for component E1 alone will be 4a fabrication
minutes. Similarly, b units of product E2 and c units of product E3 will require 12b and 8c
fabrication minutes respectively. Thus the total weekly requirement of fabrication minutes
will be 4a + 12b + 8c, which should not exceed the available 5,400 minutes. So, the first
constraints can be formulated as shown in Equation 13.2:
Step 4: Similarly, the constraints for testing and packing subdivisions can be formulated
as shown in Equations 13.3 and 13.4:
4 a 4b 8c 3600 (13.3)
FIGURE 13.4
Flowchart for Modeling and Solving an Optimization Problem
Step 5: The objective is to maximize the weekly total profit. Assuming that all compo-
nents produced are immediately sold in the market, the total profit is given by Equation
13.5:
z 4 a 3b 5c (13.5)
Clearly, the mathematical model for the problem under consideration can be summa-
rized as shown in Table 13.5.
Step 6: The author selects Google’s OR-Tool GLOP solver and Python language to run
the above model. The Python code used to solve the model are listed Table 13.6:
A model is coded in Python in the same way as shown in above solution code. The line
sol = pywraplp.Solver.CreateSolver('Product Allocation Problem', 'GLOP') invokes
TABLE 13.5
Summarized Mathematical Model for Product Allocation Problem
Mathematical Model for Product Allocation Problem
z = 4a + 3b + 5c
4a + 12b + 8c ≤ 5400
4a + 4b + 8c ≤ 3600
12a + 4b + 8c ≤ 6000
a ≥ 0, b ≥ 0, and c ≥ 0
Google’s own GLOP solver (OR-Tools 2020) and names it as sol. The OR-Tools could be
interfaced with a variety of solvers. Altering a solver, say CLP (COIN-OR) (COIN-OR,
2020) or GLPK from GNU (GLPK, 2020) is just a matter of altering this line.
Step 7: The optimal solution to the product allocation problem, therefore, is shown in
Table 13.7:
Hence maximum profit that could be earned is: 4(300) + 3(225) + 5 (187) = Rs. 2810.
Step 8: Model validation: If the solution obtained is correct, then it should satisfy every
constraint. Table 13.8 validates the model:
13.6.2 Case Study 2: The Transportation Problem (Swarup, Gupta, and Mohan, 2009)
XYZ makes trailers at plants in Frankfurt, Copenhagen, and Seoul, and ships these units to distri-
bution centers in London, Paris, New York, and Tokyo. In planning production for the next year,
XYZ estimates unit shipping cost (in US dollars) between any plant and distribution center, plant
capacities, and distribution center demands. These numbers are given in the Table 13.9.
XYZ faces the problem of determining how much to ship between each plant and distribution
center to minimize the total transportation cost, while not exceeding capacity and while meeting
demand.
(a) Formulate a mathematical model to minimize the total shipping cost.
(b) Set up and solve the problem on a spreadsheet. What is the optimal solution?
Steps 1 and 2: From the statement of the problem, it is clear that twelve decision vari-
ables are required to make the decision stated in the problem. The decision variables could
be expressed x11,x12, x13, x14,…, x32, x33, x34.
Step 3: Let ci be the cost of shipping one unit of trailer from plant i to distribution center
j. Therefore the cost of shipping units x could be expressed as cijxij.
Steps 4 and 5: The objective function to be minimized and the applicable constraints
therefore could be formulated as shown in Table 13.10. Equation 13.6 expresses the objec-
tive function while Equations 13.7 to 13.11 express the constraints.
Steps 6 and 7: As is obvious from the mathematical model of the problem, the solution
to the model requires the use of two-dimensional subscripted variables and Python dic-
tionaries to be utilized. The author selects Google’s OR-Tool GLOP solver and Python lan-
guage to run the above model. Python code to solve the model and output obtained are
shown in Table 13.11:
TABLE 13.6
Python Code to Solve Product Allocation Problem
Python Code for Product Allocation Problem
#Install Required Package
!pip install ortools #Execute only once
#Import required functions
from __future__ import print_function
from ortools.linear_solver import pywraplp
# Invoke the solver with GLOP.
sol = pywraplp.Solver.CreateSolver('Product Allocation Problem', 'GLOP')
# Populate variables a,b,c
a = sol.NumVar(0, sol.infinity(), 'a') #Enables the constraint a>=0
b = sol.NumVar(0, sol.infinity(), 'b') #Enables the constraint b>=0
c = sol.NumVar(0, sol.infinity(), 'c') #Enables the constraint c>=0
print('Decision variables =', sol.NumVariables())
#Formulate First Constraint 4a + 12b + 8c <= 5400
cst1 = sol.Constraint(0, 5400, 'cst1')
cst1.SetCoefficient(a, 4)
cst1.SetCoefficient(b, 12)
cst1.SetCoefficient(c, 8)
#Formulate Second Constraint 4a + 4b + 8c <= 3600
cst2 = sol.Constraint(0, 3600, 'cst2')
cst2.SetCoefficient(a, 4)
cst2.SetCoefficient(b, 4)
cst2.SetCoefficient(c, 8)
#Formulate Third Constraint 12a + 4b + 8c <= 6000
cst3 = sol.Constraint(0, 6000, 'cst3')
cst3.SetCoefficient(a, 12)
cst3.SetCoefficient(b, 4)
cst3.SetCoefficient(c, 8)
print('Total constraints =', sol.NumConstraints())
# Formulate the objective function z = 4a + 3b + 5c
objf = sol.Objective()
objf.SetCoefficient(a, 4)
objf.SetCoefficient(b, 3)
objf.SetCoefficient(c, 5)
objf.SetMaximization()
sol.Solve()
print('Product Allocation Problem Solution:')
print('Objective value =', objf.Value())
print('a =', a.solution_value())
print('b =', b.solution_value())
print('c =', c.solution_value())
Output
Decision variables = 3
Total constraints = 3
Product Allocation Problem Solution:
Objective value = 2812.5
a = 300.00000000000006
b = 225.0
c = 187.49999999999997
Note: All Python code used in this chapter has been run using Jupyter Notebook and Google Colab web applica-
tions. These applications allow users to share and create documents that contain equations, live code, narrative
text, and visualizations (Jupyter, 2020). Code is available on author’s repository at Github. URL to access the code
is: https://github.com/jahangir-amu2020/ORCS.
TABLE 13.7
Optimal Production per Week
Product Units to be produced per week
E1 300
E2 225
E3 187
TABLE 13.8
Model Validation
Constraint Calculated Value Equality/Inequality Satisfied/Unsatisfied
TABLE 13.9
Unit Shipping Cost (in US Dollars) Between Plants and Distribution Center, Plant Capacities, and
Distribution Center Demands
Distribution Center
Plant London Paris New York Tokyo Capacity
Step 8: Model Verification: This could be done in the same way as we did in the last step
of Case Study 1. We notice that all constraint are satisfied, so the model is valid.
13.6.3 Case Study 3: The Assignment Problem (Swarup, Gupta, and Mohan 2009)
As the last case study of this introductory chapter, the author presents a solution for
another important optimization problem referred to as the assignment problem. After
carefully examining the assignment problem, it is easy to conclude that the transportation
problem is a special case when the objective is to assign a certain number of resources to
the equal number of activities at a maximum profit (or minimum cost) is actually named
as an assignment problem. Another form of assignment problem is referred to as an unbal-
anced assignment problem in which number of resources are greater than the number of
activities to be performed.
Following is an example of an assignment problem:
A department head has four subordinates and four tasks to be performed. The subordinates differ
in efficiency, and the tasks differ in their intrinsic difficulty. His estimate of the time each subordi-
nate would take to perform each task is given in Table 13.12:
How should the head allocate the task to subordinates (one task to each) so as to minimize the
total time to complete the tasks?
TABLE 13.10
Mathematical Model for Transportation Problem
Mathematical Model for the Transportation Problem
z c x
i 1 j
ij ij (13.6)
or
x
i 1
i1 9000, x j 1
1j 12000 (13.7)
3 4
x
i 1
i2 3000, x j 1
2j 8000 (13.8)
3 4
x
i 1
i3 9500, x j 1
3j 5000 (13.9)
x
i 1
i4 1500 (13.10)
Step 1: To mathematically express the above assignment problem we consider the gen-
eralized form of an assignment problem in which n resources are to be assigned to n activi-
ties. The cost of assigning resource i to activity j is known as cij. Table 13.13 describes the
cost matrix for the problem.
The cost matrix is same as it is with the transportation problem. However, this time the
requirement at each of the destinations and the availability at each of the resources is unity
(1). This is because of the fact that assignments are to be made on one-to-one basis.
Step 2: Let xij denotes the assignment of the ith resource to jth activity, such that:
xij = {10,,ifOtherwise
resource i is assigned to activity j
(13.12)
Step 2: Following above notions, the generalized assignment problem can be mathemat-
ically formulated as shown in Table 13.14:
Steps 3, 4, and 5: Based on above general mathematical formulation, the problem con-
sidered in the present case study could be formulated as shown in Table 13.15:
Steps 6 and 7: As obvious from the mathematical model of the problem, solution to the
model requires the use of two dimensional subscripted variables and Python dictionaries
to be utilized. The author selects Google’s OR-Tool CBC solver (a MIP solver) and Python
language to run the above model.
TABLE 13.11
Python Code to Solve Transportation Problem
Python Code for Transportation Problem
#Install Required Package
!pip install ortools #Execute only once
#Import required functions
from __future__ import print_function
from ortools.linear_solver import pywraplp
def transmodel():
"Initialize Problem Data"
pd = {}
pd['cbound'] = [12000,8000,5000]
pd['dbound'] = [9000,3000,9500,1500]
pd['obcoeff'] = [
[35,40,60,120],
[30,30,45,130],
[60,65,50,100],
]
pd['ncc'] = 3
pd['ndc'] = 4
return pd
pd = transmodel()
solver = pywraplp.Solver.CreateSolver('simple_mip_program', 'GLOP')
inf = solver.infinity()
x={}
#Create Variables and enforce greater than zero constraints
for i in range(pd['ncc']):
for j in range(pd['ndc']):
x[i, j] = solver.NumVar(0, inf, “)
print('Number of variables =', solver.NumVariables())
#Enforce Capacity Constraints
for i in range(pd['ncc']):
constraint = solver.RowConstraint(0, pd['cbound'][i], “)
for j in range(pd['ndc']):
constraint.SetCoefficient(x[i,j], 1)
#Enforce Capacity Constraints
for i in range(pd['ndc']):
constraint = solver.RowConstraint(pd['dbound'][i],inf, “)
for j in range(pd['ncc']):
constraint.SetCoefficient(x[j,i], 1)
print('Number of constraints =', solver.NumConstraints())
# Formulate the objective function
objf = solver.Objective()
for i in range(pd['ncc']):
for j in range(pd['ndc']):
objf.SetCoefficient(x[i,j], pd['obcoeff'][i][j])
objf.SetMinimization()
solver.Solve()
print('Transportation Problem Solution:')
print('Objective value =', objf.Value())
for i in range(pd['ncc']):
for j in range(pd['ndc']):
print('x[', i, j,']', ' = ', x[i,j].solution_value())
TABLE 13.12
Time Required by Each Subordinate
to Perform Each Task
Subordinate
Tasks A B C D
T1 18 26 17 11
T2 13 28 14 26
T3 38 19 18 15
T4 19 26 24 10
TABLE 13.13
Cost Matrix for Assignment Problem
Activities
Resources A1 A2 … An Available
Python code to solve the model is available at author’s repository and can be accessed
using the URL: https://github.com/jahangir-amu2020/ORCS/blob/master/Case-
Study-3.pdf.
The code also illustrates how to solve the assignment problem using a mixed-integer
programming (MIP) solver.
TABLE 13.14
Mathematical Model for Generalized Assignment Problem
Minimize:
n n
z c .x
i 1 j 1
ij ij (13.13)
n n
x
i 1
ij 1 and x
j 1
ij 1; where xij 0 or 1 (13.14)
Subject to : g i z 0 i 1, 2, 3
TABLE 13.15
Mathematical model for case study assignment problem
Minimize:
4 4
z c .x
i 1 j 1
ij ij (13.15)
4 4
x
i 1
ij 1 and x
j 1
ij 1; where xij 0 or 1 (13.16)
hj 0 j 1, 2, 3
13.7 Conclusions
In the past few years, research in mathematical optimization, machine learning, and data
science have become highly interrelated. Branches of mathematical optimization are being
fully exploited by machine learning researchers. With the help of available mathematical
optimization modelers, algorithms, and robust solvers, data scientists have an ideal toolkit
for exploring new machine learning problems. Machine learning models so obtained
require highly efficient and accurate modelers and solvers. As pointed out earlier, not all
models support all solvers, so we must use the modeler and solver in combination with
some high-level language like C/ C++/ Java or Python.
In this chapter the author has focused on Google’s Operation Research Tools (OR-Tools)
and has shown that how some well-known optimization problems can be solved using
OR-Tools in combination of Python language. In each case, first a mathematical model has
been developed, which is the primary aim of the author. The model has then been coded
and solved using OR-Tools and Python. To keep the subject matter simple and easy for all
those who are entering in the field of data science and optimization, the author has only
References
Agrawal, A., Verschueren, R., Diamond, S., and S. Boyd. 2018. A rewriting system for convex optimi-
zation problems. Journal of Control Decision 5(1):42–60
AMPL. 2020. “AMPL streamedlined modeling for real optimization.” (accessed October 17, 2020)
https://ampl.com/
Bazaraa, M., Sherali, H., and C. Shetty. 2006. Nonlinear Programming Theory and Algorithms. Wiley
Bergstra J., Komer B., Eliasmith C., Yamins D., and D.D. Cox. 2015. Hyperopt: a Python library for
model selection and hyperparameter optimization. Computational Science & Discovery. 8(1)
Bertsekas, D.P. 2004. Nonlinear Programming. Athena Scientific, Cambridge
Bishop, C. 1996. Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
Blank J., and K. Deb. 2020. pymoo: multi-objective optimization in python. IEEE Access 8:
89497–89509
Bodnia V. 2020. Google OR-Tools business value and potential. (accessed October, 2020) https://
freshcodeit.com/google-or tools#:~:text=The%20primary%20purpose%20of%20
using,%2C%20graph%20algorithms%2C%20and%20more.
Boyd S., and L. Vandenberghe. 2004. Convex Optimization, Cambridge University Press, The
Edinburgh Building, Cambridge
Chugh, T., Sindhya, K., Hakanen, J., and K. Miettinen. 2017. Handling computationally expensive
multiobjective optimization problems with evolutionary algorithms: a survey. Soft Computing
23: 3137–3166
COIN-OR. 2020. “Computational Infrastructure for Operations Research” (accessed October, 2020)
https://en.wikipedia.org/wiki/COIN-OR
Cortes, C., and V. Vapnik. 1995. Support-vector networks. Machine Learning 20(3): 273–297
Diamond, S., and S. Boyd. 2016. CVXPY: A Python-Embedded Modeling Language for Convex
Optimization. Journal of Machine Learning Research. 17(83): 1-5
Dolan, D. and J. More. 2002. Benchmarking optimization software with performance profiles.
Mathematical Programming 91(2):201–213.
ECLiPSe. 2020. “The ECLiPSe Constraint Programming System.” (accessed September, 2020) http://
eclipseclp.org/
GASM. 2020. “GAMS System Overview” (accessed August, 2020). https://www.gams.com/prod-
ucts/gams/gams-language/
GECODE. 2020. “Generic constraint development environment.” (accessed August, 2020) https://
www.gecode.org/
GLPK. 2020. “GNU Linear Programming Kit.” (accessed August, 2020) https://www.gnu.org/soft-
ware/glpk/
Goberna, M.A., and M.A. Lopez. 1998. Linear Semi-Infinite Optimization. John Wiley, New York.
Golub, G.H., and U. Mattvon. 1997. Generalized cross-validation for large scale problems. Journal of
Computational and Graphical Statistics 6(1):1–34.
GUROBI. 2020. “GUROBI Optimization.” (accessed August, 2020) https://www.gurobi.com/