0% found this document useful (0 votes)
95 views11 pages

Application of Machine Learning Techniques in Project Management

This document discusses applying machine learning techniques to help project managers better plan projects and predict risks. It aims to create a web-based platform that learns from a project manager's past experiences to provide tailored suggestions and risk analysis when planning new milestones. The system uses techniques like instance-based learning to identify similar past milestones and assess risks for new ones. The goals are to help improve project planning and lower failure rates through a user-friendly tool that can return solutions in real-time.

Uploaded by

Roderick Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views11 pages

Application of Machine Learning Techniques in Project Management

This document discusses applying machine learning techniques to help project managers better plan projects and predict risks. It aims to create a web-based platform that learns from a project manager's past experiences to provide tailored suggestions and risk analysis when planning new milestones. The system uses techniques like instance-based learning to identify similar past milestones and assess risks for new ones. The goals are to help improve project planning and lower failure rates through a user-friendly tool that can return solutions in real-time.

Uploaded by

Roderick Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Application of Machine Learning Techniques in Project Management

Tools

Miguel Pedroso (70556)


Instituto Superior Técnico - Universidade de Lisboa

May 2017

Abstract cur while the project is in a later phase of development, the


risks can sometimes make the whole project not viable and
Nowadays, it is more important than ever to be as efficient therefore it will need to be canceled, after a great amount
as possible in order to overcome the difficulties brought of time and resources have been invested.
by the financial crisis and the economic deceleration that To help overcoming these challenges I decided to study
has been striking the western countries. The high rates and create a platform to be used by teams to help them to
of project failure due to poor project planning hold back improve their project planning, management and control.
teams and potential wealth creation. In order to assist This platform will learn each user’s particular management
project managers in planning their projects and assessing and planning style to provide suggestions, identifying risks
risks, we created a system that helps project managers to based on the user’s previous history.
predict potential risks when they are planning their project
milestones based on their own previous experiences. The 1.1 Problem
system is divided into 3 components: the Web-based plat-
form, the database and the MachineLearning core. In or- A more specific description of the problem that this dis-
der to achieve this goal, we applied several Artificial In- sertation is intended to solve is helping the user to improve
telligence techniques. It is of utmost importance that our his/hers project planning activities in order to avoid and mit-
system is able to start performing risk analysis and provid- igate risks.
ing suggestions to project managers as early as possible, If we logically divide a project into several milestones and
with the minimum amount of information as possible fed each milestone into the several tasks that are necessary to
into the system. In order to do this, we explore the ap- complete it, then the technical problem that we really want
plication of techniques such as Instance-based learning, to solve is the one that consists on identifying similar mile-
which is a family of machine learning algorithms that com- stone instances. If we model a solution where a milestone
pares new problem instances with instances that were pre- can have zero or more problems / occurrences associated
viously observed during the training, instead of performing with itself, every time a project manager is planning a new
explicit generalization, like other algorithms such as neural milestone using our system, we can look at his/hers work
networks. We also tried other methods such as regres- history and find similar milestones and check them in or-
sion model algorithms. The results from our evaluation der to perform risk analysis on the milestone that he/she is
are quite reasonable and show that the learning algorithms currently planning.
work well when tested against scenarios that are expected This is an interesting problem, as it must be solved in
to happen in the real world. real-world scenarios and take human factors into account
(for example, that a project manager learns over time and
tends not to repeat the same mistakes indefinitely). A pos-
1 Introduction sible solution to this problem will be carefully studied for the
rest of this document, as we consider it as being essential
According to The Bull Survey (1998) the main reasons why for us to build a system that is able to learn with the behav-
the IT projects fail are related to problems during project ioral patterns of the project manager and his/hers team,
management. Some projects are not properly planned, for and assist them during their planning in order to maximize
example by having an improper requirement definition and their success and mitigate potential risks.
a poor risk analysis. Risk analysis is a task of utmost im-
portance that should be done from a very early stage of the
1.2 Goals
project. It is important to identify the probability of some-
thing going wrong during the course of the project and if This thesis aims to solve the problem of high project fail-
something actually goes wrong, the team needs to have a ure rates due to bad or inadequate planning. In order
plan to deal with the situation. When these situations oc- for this tool to be useful, deployed and adopted in real-

1
world scenarios, the machine learning algorithms must re- • Verification - In the verification phase, the software is
turn solutions in real-time or near-real-time, and it must be properly tested against the project’s requirements and
integrated into a pleasant and easy-to-use graphical user use cases.
interface. To complete our goal, we must extensively an- • Maintenance - This phase consists on making small
alyze the existing scientific work in the machine learning changes in the software to adapt it to possible
field and adapt it to our problem. changes of the requirements that arose during the pre-
vious phases.
2 Related Work The biggest disadvantage of the waterfall model is that
In this thesis we address the problem of IT project failure it is usually very hard to fully specify the project require-
due to poor planning and bad risk analysis. In this section ments in the beginning of the project before any design or
we review both project management methods and machine development has been performed.
learning techniques and applications. The person or organization that ordered the develop-
ment of the project doesn’t always know from the early
2.1 IT Project Management stage of the problem what they really want to be build.
Non-technical users and organizations always plan to start
Project Management is defined as a discipline that has
a project with the goal of solving at least one problem but
the goal of initiating, planning, executing, controlling, and
some of the requirements to achieve that goal are very of-
closing (Project Management Institute, 2004) the work of a
ten only fully discovered during the design or development
team which has the goal of achieving a certain predefined
of the actual project.
success criteria. A Project can be defined as being an indi-
Despite of the previously pointed disadvantages, the
vidual or collaborative endeavor that is planned to achieve
waterfall model works quite well as a life-cycle model in
a particular aim. On the other hand, a Project Manager
projects that are well understood from a technical point of
is an individual that has the responsibility of planning and
view and that have a stable requirements definition since
executing a particular project.
the beginning of the project planning (McConnell, 1996).
In the following sub-sections some of the popular project
design processes will be presented in detail. Despite the 2.1.2 Spiral model
fact that project management is a general discipline, we will
The spiral model (Boehm et al., 1987) is a risk driven pro-
keep our focus in project management techniques applied cess model. The spiral model consists on partitioning the
to IT projects. project development process on multiple flexible loops that
2.1.1 The Waterfall model are appropriate to the context of the process that’s being
developed, but usually it can be breakdown into the follow-
The waterfall model (Royce, 1970) is a sequential project
ing steps:
design process where the project orderly progresses
through a sequence of defined steps beginning in the prod- • Determine objectives / Planning - During this phase
uct concept or requirements analysis to the system test- the project’s requirements are all gathered into a re-
ing. The waterfall model usually serves as a basis for more quirement specification document.
complex and effective project life-cycle models. • Risk analysis / Identify and resolve risks - In this
The number of phases or steps in the waterfall model is phase. If any risk is found during this phase, the team
flexible and depends on what the project is and how the will suggest alternatives to try to mitigate the risk, or
project manager wants to organize the project’s develop- an appropriate contingency plan will be created.
ment; however the traditional waterfall model usually fea-
• Development and tests - This is the phase where the
tures the 5 following phases:
actual software is developed. After the development is
• Requirement analysis - The project manager con- completed, the software is then properly tested.
ferences with all the project’s stakeholders and the • Plan the next iteration - In this phase the customer
project’s team and gathers all the requirements in a evaluates the project in its current state before it goes
requirement specification document. into the next spiral.
• Design - The requirements of the project are carefully A project that is being developed using the spiral model
studied by the project’s team and the project’s archi- is planned to start small and incrementally expanding the
tecture is planned. scope of the project. The scope of the project is only ex-
• Implementation - The development team starts to im- panded after the project manager reduces the risks of the
plement the software based on the design obtained next increment of the project to a level that he/she consid-
from the previous phase. The software is divided into ers to be acceptable. The spiral model is, therefore, highly
small units. After the development of all the units is dependent on risk analysis, which must be done by people
completed, they are all integrated. with specific expertise.

2
In the spiral model, typically costs increase as the project 2.3.1 Instance-based learning
is being developed but risks decrease (McConnell, 1996).
Instance-based learning is a family of machine learning
For a project that needs to be developed quickly using this
algorithms that compares new problem instances with in-
model, the more time and money that is spent, the less
stances that were previously observed during the training,
risks the project leader is taking.
instead of performing explicit generalization, like other al-
As the spiral model life cycle is risk-oriented, it is able to gorithms such as neural networks (Aha et al., 1991).
provide very valuable feedback even in early stages of the On our particular domain, we want our system to learn
project. If the project cannot possible be done for technical each individual user patterns, and provide information that
or financial reasons, the project manager will find out early will guide him/her over the course of his/her project man-
and will be able to cancel the project before too much time agement activities. In other common types of machine
and money being spent. The spiral model cycles, just like learning a lot of data is collected to be used to train a
most project management models is flexible and can be learning model, like logistic regression or artificial neural
adapted to each projects need. networks. These models learn an explicit representation
of the training data and then can be used to classify the
2.2 Software Tools for Project Management training examples. The problem with this approach is that
it requires a considerable amount of training data, whereas
There are currently many software tools on the market in our practical domain, we want to help the user of our sys-
that have the goal of helping project managers to plan tem by providing suggestions or identifying project risks as
their projects, organize tasks, perform risk assessment and early as possible. The suggestions and risk analysis are
other related features. Some of these tools will be briefly dependent on each user and project type, so in our domain
described in the following sub-sections. Even though there it is not appropriate to construct a single large dataset for
is a considerable amount of good project management the entire platform. The user-dependent information on the
software, most of these tools do not offer any kind of ”ma- system is what must be used to provide information to that
chine learning” solutions to assist the users. particular user. It is for this particular reason that instance-
based learning must be explored on our system, as it is
2.2.1 Microsoft Project able to perform machine learning on a small and dynami-
Microsoft Project is a project management software devel- cally growing amount of data.
oped by Microsoft, and as of 2017 it is the most widely used 2.3.2 k-Nearest neighbors algorithm
project management program in the world (http://project-
The k-Nearest Neighbors (k-NN) algorithm (Altman, 1992)
management.zone/ranking/planning). Microsoft Project al-
is a very simple instance-based learning algorithm where
lows project managers to develop project plans, creating
each training example is defined as a vector in RN and is
and assigning resources to tasks, tracking the progress of
stored in a list of training examples, every time it is ob-
the project and other features. The project flow can be vi-
served. All the computation involved in classification is
sualized in Gantt charts.
delayed until the system receives a classification query.
A query consists on either performing classification or re-
2.2.2 Gantt charts
gression to the query point.
A Gantt chart is an horizontal bar chart that serves as a
• Classification – The output of the algorithm is an inte-
production control tool in project management and illus-
ger that denotes the class membership of the majority
trates the project schedule (Wilson, 2003). This type of
of the input point k-nearest neighbors.
diagram makes it easy for both the project manager and
the project’s team members to see which activities have to • Regression – The output of the algorithm is a real-
be done and when. value number y that is the average of the values of the
k-nearest training examples to the input query point.

2.3 Machine Learning and Project Manage- The common distance metric used in the k-NN algorithm is
ment the Euclidean distance:
v
Machine learning is a field of Computer Science that has u n
uX
the goal of giving computers the ability of learning without d(x1 , x2 ) = t (ar (x1 ) − ar (x2 ))2
r=1
being explicitly programmed. In this thesis we are going
to study the practical application of machine learning ap- where x1 and x2 are two data points in Rn and ar (x)
proaches to project management software with the goal denotes the values of the rth coordinate of the x data point.
of assisting the project manager to develop better project Training a k-NN model can be done using Algorithm 1
plans and identify and mitigate risks in an early stage of and k-NN classification can be performed with Algorithm
the project. 2.

3
Algorithm 1 k-NN Training algorithm This problem is called the ”curse of dimensionality” (Bell-
1: procedure T RAIN man, 2003) and occurs due to the fact that there is an expo-
2: Let Dexamples be a list of training examples nential increase in volume when adding extra dimensions
3: For each training example (x, f (x)), add the exam- through a mathematical space.
ple to the list Dexamples
The standard implementation of the k-NN algorithm can
therefore be rendered useless on vector spaces with an
Algorithm 2 k-NN Classification algorithm high number of dimensions. In this case, the main problem
1: procedure C LASSIFY
stops being to find the k nearest neighbors of the queries
2: Let xq be a query instance
that are passed through the algorithm but to find a ”good”
3: Let x1 . . . xk be the k instances from the training-
examples nearest to xq by a distance metric D distance function that is able to capture the similarity of
4:
Pk
Return f (xq ) := argmax i=1 f (xi ) two feature vectors where each feature will have a different
weight contribution to the distance function, based on the
importance of that feature on the context of the problem
2.3.3 Distance-weighted nearest neighbor algorithm itself.
The k-nearest neighbor algorithm can be modified to 2.3.5 Feature-weighted nearest neighbor
weight the contribution of each one of the query points xq ,
by giving greater weight to close neighbors rather than dis- One of the ways of overcoming the curse of dimensional-
tant ones. Clearly when classifying a query point with the ity problem is by weighting the contribution of each feature
k-NN algorithm with k = 5, if 3 of the 5 nearest neighbors (Inza et al., 2002; Tahir et al., 2007), when performing k-
are clustered very far from the query point xq and from its nearest neighbor search. The feature vector is composed
2 other different neighbors, it may introduce noisy error to with a meaningful representation of an instance of an indi-
the classification of the point. vidual model, but not all the features have the same impor-
By weighting the contribution of each point according to tance in each specific classification or regression problem.
the distance we are considerably reducing this problem. In fact, some of the selected features may even be found
For example, the weight w of each stored data point i can to be irrelevant for a particular problem.
be calculated with the expression bellow that consists on For example, the following equation can be used to cal-
the inverse square distance between the query point and culate the feature-weighted euclidean distance between
each of the data points. two feature vectors.
v
1 u n
wi = uX
d(xq , xi )2 d(x1 , x2 ) = t (ar (wr .x1 ) − ar (wr .x2 ))2
r=1
with d denoting the selected distance metric.
where wr denotes the weight of the r feature, in the fea-
With this tweak to the k-NN algorithm, the classification
ture vectors.
algorithm doesn’t even need to search for the k-nearest
neighbors of the query point as the inverse-square weight
distance function almost eliminates their contributions, so 2.4 Regression models
it is now appropriate to perform classification by using the
2.4.1 Logistic regression
contribution of the entire stored training set.
Logistic regression is a regression model where the de-
2.3.4 The curse of dimensionality
pendent variable (i.e. the variable that represents the out-
Dealing with data points in an high-dimensionality hyper- put value of the model) is categorical (Freedman, 2005).
space brings certain difficulties to nearest neighbor search Logistic regression can either have a binomial dependent
algorithms. For example, let x Rn with n = 50 be a vector variable (in the case where there are only two target cate-
where each coordinate represents one of the 50 features gories), or multinomial (when there are more than two cat-
available in that specific domain. Lets suppose that we egories).
want to perform a classification task on that dataset, but The goal of Logistic regression is to create a model that
only 3 of the 50 dimensions of the vector are actually rel- receives an input vector x (eg. a vector that represents a
evant for our classification. While the 3 dimensions that real-world object) and outputs an estimate y. The follow-
represent features relevant to the classification could form ing expression can be used to calculate the prediction of a
clusters of objects of the same category (eg. the vectors of given logistic regression model.
the points of the same category are near in space accord-
ing to some distance metric d), the other 47 dimensions y = σ(W x + b) (1)
could make these points of the same class becoming very
far away, rendering the nearest neighbor algorithm com- 1
pletely useless, without previous data processing. σ= (2)
1 + e−z

4
The output of a logistic regression model consists on the bution (the one that results from the models predictions at
weighted average of the inputs passed through a sigmoid any given moment in time).
activation function. The weights W and the bias term b are
2.4.3 Stochastic Gradient Descent
the parameters of the model that need to be learned. If the
non-linear activation function was not used, then logistic Gradient descent is a first order numerical optimization al-
regression would be the same as linear regression (Zou gorithm that is commonly used to train differentiable mod-
et al., 2003). The sigmoid function σ is bounded between els (Ruder, 2016). A model can be trained by defining a
0 and 1 and is a simple way of introducing a non-linearity cost function which can be minimized by using the Gradi-
into the model and conveying a more complex behavior. ent Descent algorithm, that updates the model parameters
In order to train the regression model (i.e. finding the on each time-step, by applying the following rule:
desired values for the parameters W and b), we can em- δ
ploy numerical optimization techniques such as Stochastic θ := θ − η J(θ) (5)
δθ
Gradient Descent (SGD) (Ruder, 2016).
where η is the learning rate and θ is the parameter that is
Regression can be used not only to train a model that is
going to be learned and J is the cost function. This rule is
able to make predictions, but can also be used to weight
applied iteratively and changes the value of the parameter
the contribution of each feature in the feature vector for a
until the model converges into a local minima of the cost
particular problem. The magnitude of each learned param-
function.
eter in binomial logistic regression does not directly rep-
resent the ”semantic importance” of the particular feature 2.4.4 Model overfitting and underfitting
that the parameter is associated to, in the context of the In a statistical model, overfitting is the term used to de-
problem that’s being modeled. For example, the feature scribe the noise or the random error that is part of the
vector f of dimension n could have one feature that repre- model, rather than the true underlying relationship be-
sents the mere scaling of another feature (eg: f1 = 100f0 ). tween the input vector and the output value. Overfitting
In this case, the learned coefficients associated with those usually occurs when the model is either ”too big” or the
features, would be different by a factor of about 100, while training data that is passed to the training algorithm is not
at the same time, the semantic importance of features f0 enough.
and f1 would be the same.
However, at the same time, the model will tend to learn 2.4.5 Dynamic Learning Rate
smaller coefficients that are associated with less impor- It is a common practice to decay the learning rate dur-
tant feature vectors. As an example, if feature f3 consists ing the training process, just like the Simulated Anneal-
of random noise and has no semantic importance in the ing (Kirkpatrick and Gelatt, 1983) algorithm. The following
classification problem whatsoever, as long as the model rule, if applied at each training epoch t, will exponentially
converges during training time, a small coefficient will be decay the learning rate:
learned so that the noisy feature gets filtered out.
ηt+1 := rηt (6)
2.4.2 Cost function
A cost function (also called a loss function) is a function where r is a constant that denotes the decay ratio of the
C used in numerical optimization problems, that returns learning rate at each epoch.
a real scalar number which represents how well a model
performs to map training examples into the correct output. 3 Adaptive Risk Analysis Tool
The Binary cross-entropy function is a way of comparing
the difference between two probability distributions t and o Before we describe our system, we need to explain in de-
and is usually used as the cost function of binary logistic tail what is the problem we address, which parameters
regression (when the output of the logistic regression if a we have to consider, and, which restrictions/constraints we
scalar binary value - 0 or 1). have to take into account. The following sections describe
the problem, the relevant data to the problem and how it is
crossentropy(t, o) = −(t.log(o) + (1 − t).log(1 − o)) (3) organized, as well as the explored techniques.

where t is the target value and o is the output predicted 3.1 Problem Formulation
by the model. When the logistic regression algorithm is
Our problem consists on performing risk analysis at the
used to classify data into multiple categories, the categori-
”milestone” level. In order to perform risk analysis on a
cal cross-entropy function is usually used as the loss func-
particular milestone that is being planned at a certain point
tion. X in time, we must find similar milestones in the project man-
H(p, q) = − p(x)log(q(x)) (4)
ager previous history and check its registered problem oc-
where p is the true distribution and q is the coded distri- currences.

5
3.2 Data Models plan their projects. As we want to perform a risk analysis
per milestone, we need to compare the current milestone
In order to create a solution that helps teams to manage
which is being planned by the project manager with previ-
their projects by providing suggestions and identifying risks
ous milestones that are stored in the system. In order to
based on each user previous history requires storing infor-
do this, we must list the features that will be used to create
mation in several appropriate data models. The structure
a feature vector for each milestone.
of these models and their purpose will be described bellow.
The features of each milestone include the Number of
A Project is divided into Milestones and each Milestone is
Users, Number of Tasks, Duration, Type, Average dura-
divided into Tasks. These three data models store the in-
tion of tasks, duration of the project, the order of the mile-
formation necessary to keep track of the structure of the
stone in the project, standard deviation of the duration of
projects in our platform. A milestone can be independent
the tasks and an histogram of the Task types
from all the others, but can also be connected to another
milestone that must have been completed previously. Fig- 3.3.1 Feature representation
ure 1 shows a UML representation of both data models The T ype of a milestone is a categorical feature which can
that were detailed in the following subsections. be any value c in the set of milestone types LM ilestoneT ypes .
Let l = {A, B, C, D} be the list of milestone types of a
• Project (Name, Start Date, Due Date)
particular project manager. Without any prior information
• Milestone (Name, Project (FK), Previous Milestone of similarities between types, we must assume that all the
(FK), Type, Start Date, Due Date) types are equally different to each other. If we choose to
• Task (Name, Project (FK), Milestone (FK), Type, Ex- represent each category as it’s index in the list (a single
pected Duration, Due Date, Status, Complete Date) integer value), we run into problems while using distance
metrics such as the euclidean distance, as the distance
• Problem (Name)
between the type A and type B and lower than the distance
• ProblemOccurrence (Problem (FK), Milestone (FK), between type A and type D.
Task (FK), Date) In order to address this issue, the milestone T ype feature
• ChangeEstimate (Milestone (FK), Task (FK), Date, can be represented as a one-hot vector, which is a vector
Old Estimate, New Estimate) that has zero magnitude in all its directions but one, where
the magnitude will be equal to 1. This vector v ∈ Rn will
We also need to have models that will store the feedback where k is the number of categories. The vector will have
from the user throughout the project. The ProblemOccur- magnitude 1 in the dimension that corresponds to the index
rence model has the purpose of storing problems that any of the T ype in the set of possible milestone types. As an
team member may experience during the project, while the example, to represent the type C ⊂ s we use the following
ChangeEstimate model has the purpose of keeping track one-hot vector:
of the changes in the duration of tasks made by the project
manager or by any user assigned to that task. FT ype = [0, 0, 1, 0]T (7)

3.3 Techniques Explored 3.3.2 Learning algorithms

In this Section, we provide a description of the techniques In order to solve our problem, we use two types of ma-
used to develop a solution to the problem of creating a chine learning algorithms: Instance-based learning algo-
machine learning approach to help project managers to rithms and Regression models.
3.3.3 Nearest neighbor algorithms
One of the approaches used to solve our problem is using
nearest neighbor search in order to find similar milestones
and perform risk analysis during the planning of the current
project.
3.3.4 Top milestone risks
The output of the nearest neighbor algorithm consists on
the k-nearest neighbors to the query point. In our domain,
the nearest neighbor are milestone vectors. We check
if the milestones associated with these vectors have any
problem occurrences associated. From our data models,
problem occurrences are associated with individual tasks.
As a milestone is composed of multiple tasks, a milestone
Figure 1: UML representing the both data models can have a problem of a certain type associated with itself

6
multiple times. If so, we then will check what is the most 3.3.7 Regression models
common problem associated with that milestone.
We also tried an alternative approach to the nearest neigh-
bor algorithms by building a logistic regression model that
3.3.5 Representing Problem Occurrences
is able to predict a potential project risk from an input mile-
As our system is designed to be an open web platform stone (i.e. the milestone that a project manager as just
used by many project managers working at the same time introduced into the system). In order to do this we build a
and we need to provide user-specific suggestions, we training set T consisting on (x, y) tuples where x is a mile-
needed to have a way of representing the problem oc- stone vector and y an integer that represents the problem
currences in our algorithms. In a real-world scenario a instance that occurred when the project team was working
wide variety of problem types may occur during the course on that particular milestone.
of a project. We represent problem types as an inte- The probability of each problem type happening can be
ger. An integer number represents the index of a prob- calculated with the expression 8:
lem in the user-specific problems list. For example, if a
eWi x+bi
user experienced problems of type A, B and C in his/her P (Y = i|x, W, b) = sof tmaxi (W x + b) = P Wj x+bj (8)
past history then the problem-integer mapping will become je

{A : 0, B : 1, C : 2}. A softmax activation function is used as it is a generaliza-


tion of the logistic function used in multi-categorical classi-
3.3.6 Feature selection and weighting
fication (Bishop, 2006). The top risk can be predicted with
Nearest neighbor algorithms such as the k-NN can some- he following expression:
times face the curse of dimensionality problem where on
ypred = argmaxi (P (Y = i|x, W, b)) (9)
particular scenarios only a few features in the feature vec-
tor are meaningful to the particular problem in hand. In
The model is then trained using the SGD algorithm with
our domain, a project manager may plan a project and
momentum and is able to predict risks in a simple man-
only face project management-related problems on certain
ner. This logistic regression model differs from the one
kinds of milestones, like milestones of a certain ”duration”
used in the previous subsection due to the fact that we are
in time (i.e. short milestones). In our vectorial representa-
now performing multi-categorical classification instead of
tion, the ”duration” of the milestone is represented as one
just classifying if an input milestone has a potential risk or
dimension in the vector space, while at the same time, the
not.
rest of the vector will represent features that are not really
The model learns how to predict a risk from an input
relevant in this particular scenario.
milestone but the model itself results from global optimiza-
In order to solve this problem I propose a regression-
tion so it has no longer information about every single train-
based feature selection, where a coefficient will be as-
ing example. This is a limitation of this type of technique in
signed to each feature. This coefficient represents ”how
contrast to instance-based learning methods where all the
important” the feature is to a particular user. Large coeffi-
information is stored and on particular scenarios may pro-
cients will reinforce the importance of a particular feature in
vide better information to the project manager (i.e. when
the feature vector and a smaller coefficient will reduce the
two problems occurred during the course of a particular
importance of the feature. This technique can be thought
past milestone and can now occur again but the model
off as expanding and contracting the vector space in order
does not have the information to predict it).
to manipulate the nearest neighbor search.
As each project manager user has his/her own work his- 3.3.8 Hybrid approach
tory and experiences, feature weighting must be done in- Another interesting way of approaching our problem is to
dividually for each project manager. To do this we create provide an hybrid solution that uses both instance-based
a training set that consists on a list of the milestone vec- learning and regression models. Even though in regres-
tors that are stored in the system and were planned by that sion models we no longer can access the user’s past expe-
particular project manager. riences, we are still able to predict the top risk associated
The logistic regression model is trained using the SGD with an input milestone vector.
algorithm with momentum and converges in real time in An hybrid solution may consist on building a table of po-
real-world scenarios as the number of stored milestones tential risks returned from the output of both algorithms and
of each project manager is usually small enough for this weight the contribution of each algorithm on the all risks.
computation to be quick. In our current application of the The top risk will be the one with the greatest contribution.
logistic regression algorithm, we are not interested on the
3.3.9 Penalizing older experiences
model’s predictions but rather on the learned parameters
wr . Each parameter wr will serve as the ”importance” co- As this system is made to be used by a project manager
efficient for the r dimension of the milestone vector. to plan projects, we must count with the fact the that the

7
Project Manager (PM) will learn from his/her past experi-
ence. For example, initially the project manager may begin Database

by underestimating the real amount of time that is neces-


sary to complete a certain kind of task, but overtime he/she
will learn how to make better estimates. The consequence
of this human factor is that when our system is trying to
Web Platform ProjMngLearnerOpt
perform risk analysis, querying the most similar past mile-
stone records may not always be enough to return the best
possible results. Figure 2: ProjManager architecture schema
To address this issue, we can penalize query results by
the relative time that those milestones were introduced into
read/write actions, but the interaction between the web
the system, and give greater importance to recent results,
platform and the software to generate the solution is just
which we can make an assumption that are more adequate
an activation interaction. In this case, the platform starts
to the project manager’s current expertise. To achieve this
the software; however, the result is written in the database
goal, many possibilities can be tried. For example, the fol-
and not directly returned to the web platform.
lowing expression:
In the following sections, we present the details for the
k web platform as well as for the software responsible for
c= (10)
d(x1 , x2 ).∆t the machine learning. The database component is used
represents a possible decay in distance and time where just to store data. There is no logic associated with this
the greater the distance in space d(x1 , x2 ) and time ∆t be- component.
tween the query point and each point in the dataset is, the
smaller importance will it have. It is very important to con- 4.2 Web Platform
sider the distance between the two vectors and not just After considering multiple options it was decided that the
the time difference to account for scenarios where an older system would run in a web platform. A web platform makes
milestone is very similar to the one that the project man- it possible for its users to access it from everywhere in the
ager is currently planning and there were no similar mile- world, and with any device with a web browser, including
stones planned recently. We want to factor this scenario mobile devices. It is important that a system that has the
into our system and give a greater penalty to milestones goal of helping teams managing their projects has also an
that are further in both space and time. interface that can be accessed very easily.
In the platform a user can Create, Change and Delete
4 Architecture and Implementation Projects, Milestones and Tasks as well as report Occur-
rences.
In this section, the details about the architecture used in
4.2.1 Implementation
our solution will be described. The system was built with
the goal of being versatile and easily modified, so that this The web platform was developed by using the Django
work can be expanded in the future by introducing new framework, which is an high-level web framework that
modules that may help to improve this solution and turn follows the model-view-template (MVT) design pattern
it into a even more valuable system. (Burch, 2010). The reason why django was chosen was
because it is a simple but powerful open-source framework
4.1 General Overview with a very big community; it has exhaustive documenta-
tion and has a large amount of available plug-ins that can
Our system is divided into three main independent compo-
be used to extend the platform. It also is cross-platform
nents as in shown Figure 2:
and supports many database management systems, in-
• Web Platform - it is used by the end users (project cluding PostgreSQL, MySQL and SQLite. The django
managers) to manage their projects, introduce infor- framework is also cross-platform and django projects can
mation and get feedback; be easily deployed to the most common web servers, in-
cluding Apache and Nginx.
• Database - it has the purpose of storing all the data in
As this platform’s goal is to help the teams to manage
a persistent way;
their projects it is of utmost importance to provide a pleas-
• ProjManagerLearnerOpt - the module that is respon- ant, straight-forward and good-looking user interface so
sible to handle all the queries that require responses that the end-users will actually use this system. In order to
based on machine learning. respond to this requirement, the Twitter Bootstrap 1 HTML,
CSS, and JS framework. Bootstrap can be used to create
The arrows in Figure 2 represent data flows between
the components. The interactions with the database are 1 http://getbootstrap.com/

8
UIs that are appealing to the users and that looks good on The reason why the ProjManagerLearnerOpt compo-
a wide range of screen sizes, including on mobile devices. nent is kept separated to the web platform (not only in the
system architecture, but also in the implementation itself)
4.3 Database is to make it possible for the platform to scale while on
production. The web platform could itself do all the tasks
The database component is used to store information in a
that the ProjManagerLearnerOpt component does, but we
persistent and consistent way and has no special features
want to make sure that the machine learning component
or functionalities. The web platform is connected to the
that will perform all the heavy computation can easily be
database and reads and writes data. The database is a
replicated into several servers, depending on the amount
key component of this platform as it is where all the data
of users using the platform at a particular point in time. In
provided by the users is stored.
this scenario the web platform would use a load balancer
4.3.1 Implementation that would decide which ProjManagerLearnerOpt server is
available to serve the user’s requests.
We do not have the need to use any particular platform-
specific features to develop or deploy our solution. We 4.4.1 Implementation
rather just need to use the basic database operations to The ProjManagerLearnerOpt module was developed in the
access and manipulate the data: Select, Insert, Update python language, just like the web platform module. The
and Delete. reason for this choice is that python is one of the most used
programming languages for machine learning and data sci-
4.3.2 Model
ence, containing a great amount of highly optimized and
This project uses a relational database as its storage sys- actively maintained machine learning libraries, such as
tem. There are currently many other possibilities, but re- scikit-learn, Tensorflow, Theano, Pylearn2 and Caffe.
lational databases were chosen because they are widely In this module we often need to perform many math-
used and they allow the practical implementation of the ematical computations, and for that we use the python’s
models presented in Section 3.2 with minor changes. It NumPy library, which is the fundamental library for scien-
is not the scope of this dissertation to explore alternative tific computing in Python. As python is an interpreted lan-
methods to store information. guage, code runs slower than in other compiled program-
ming languages such as C and C++. In order to address
4.3.3 Database Management System
this issue, the NumPy library has many of its operations
There are plenty of DBMS that implement a relational implemented in C, bringing must faster running times to
model. Despite having different features, all of those python (van der Walt et al., 2011).
DBMS have a common base. We are just interested in
those base features. So our choice of DBMS was based
again in easy access to infrastructure.
5 Evaluation
To develop this platform we used SQLite which is a very In the previous sections we described the architecture and
lightweight database management system and stores the inner workings of our solution that has the goal of help-
information on a single small file on the OS file system, ing IT project managers to better plan their projects. It is,
which makes it very portable. SQLite has the advantage therefore, of utmost importance to assess the quality of the
of not requiring any particular configurations or database developed solution by performing evaluation tests and take
server. On the other hand SQLite has poor performance conclusions according to the obtained results.
and does not support user management but these issues
are not relevant during the development time. 5.1 Evaluation Sets and Environment
In order to evaluate our solution we programmatically cre-
4.4 ProjManagerLearnerOpt
ated several datasets. The task of helping project man-
The ProjManagerLearnerOpt component is used to per- agers during the planning of their projects is dependent not
form all the machine learning operations of the platform. only on the project type and features, but on the project
This component is connected with both the database and manager himself/herself. An inexperienced project man-
the web platform. When a user makes an operation that re- ager can, for example, underestimate the time that is nec-
quires a machine learning response (for example, the user essary to complete a certain type of task and therefore
has just finished to plan a project milestone, and wants create delays in the delivery of a milestone, and thus, the
to perform risk analysis on that particular milestone) the project itself. As the learning task is dependent on the
web platform queries the ProjManagerLearnerOpt system project manager himself/herself, it is necessary to create
which then makes database queries, computes the result datasets that have this situation into account.
using the appropriate algorithms and returns the response The datasets consists on several lists of Projects, Mile-
to the web platform. stones and tasks, where each milestone can be associ-

9
ated with at least one Problem instance (that represents an the web platform to our machine learning back-end that
occurrence that happened during the course of that mile- is generic enough to be replaced by a different one with
stone). Each of these components, as described in section minimum amount of effort from the developers.
3, has several defining properties. Assigning random val-
ues to all the properties of the instances of these models 6 Conclusions
would then render the learning task impossible and use-
less. The properties of the milestones and tasks are as-
6.1 Summary
signed with values that are between intervals of realistic
values. In order to make the dataset useful and interest- In this dissertation we proposed a system that has the
ing, we introduce certain ”biases” into the dataset gener- goal of helping project managers to improve their plan-
ator algorithm. For example, milestones of type ”A” with ning by performing risk analysis based on their previous
more than ”n” tasks, have ”k%” probability of reporting the professional history, every time they are planning a project
occurrence of problem of type ”P”. milestone. Our literature review showed that there are two
Another important factor to consider while building the main types of machine learning algorithms that contribute
datasets is that project managers learn over time. For to solve this problem: Instance-based learning and Re-
example an inexperienced project manager that uses our gression models. We developed models using both ap-
platform may underestimate the time needed to complete proaches. Our tests show that both techniques are able
the milestones in the beginning, but may learn how to cor- to give quite satisfiable results when applied on real-world
rectly estimate the duration over time. To address this real- scenarios. We were able to integrate these algorithmic so-
life scenario, part of the evaluation sets are also generated lutions into a platform that can be used by project man-
taking this into account. agers and their teams in order to be more efficient and
improve their project success rates by discovering and mit-
5.2 Solution quality igating potential risks before they happen.

5.2.1 Accuracy 6.2 Achievements


In Table 2, we present the accuracy of several applied tech- This work allowed us to develop a solution that project
niques when used to predict the top potential problem of a managers can use in order to prevent potential risks based
project’s milestone. All these methods were tested against on their previous experience and work history. The evalu-
the automatically generated datasets. The k-NN + TIME ation of our work showed that the machine learning algo-
method is a k-NN classifier that gives more importance to rithms are able to provide very satisfactory results, while
the latest history of the team rather than the old (unless the dealing with different time of human factors such as the
old records are much ”closer” to the query point). The k-NN fact that project managers tend to learn with their own mis-
+ FEATURE method consists on k-NN with feature weight- takes overtime. Our solution takes all these human factors
ing performed by a logistic regression model. On the other into account with the goal of providing the best possible
hand, the k-NN + FT + TIME method is a feature-weighted risk analysis and prevention solution.
k-NN algorithm that gives more importance to recent mile-
stone entries. The LOGREG method consists on a logistic 6.3 Future Work
regression model that predicts the top potential problem for
We consider that our work can have several future im-
the input milestone.
provements in order to achieve better results. One of the
From the results we obtained we observe that all the
ways we think it would be interesting to explore in the fu-
techniques used are learning and presenting good results.
ture, after this system is fully deployed and used by many
The k-NN algorithm with feature weight seems to perform
dozens of users on real-life scenarios, would be to study
very well when few data is present, like in Dataset 1. Lo-
global patterns in the user’s history. This could be achieved
gistic regression models seem to perform decently, even
by applying machine learning algorithms (eg. clustering al-
though seem to perform a little poorer when less training
gorithms like k-Means Clustering) to divide the users into
examples are available.
several clusters. Each of these individual clusters would
Logistic regression also performs poorly on datasets
represent project managers and teams with similar experi-
where the user learns over time, like we expected as it
ences and profiles.
is trained with the entire user history, making it a little bit
We speculate that this could be very helpful when per-
more inadequate in real-world scenarios.
forming risk analysis on types of tasks and milestones that
a specific user never performed before. According to that
5.3 Extensibility of the solution user’s own profile, the system could predict several poten-
Another very important point of our solution, mainly on the tial project planning risks based on previous experiences
web platform is the fact that it was designed to be extended from other users of the platform with a similar profile and
in a simple way. We employ a mechanism of connecting history.

10
Number of Number of Number of PM learns
projects milestones tasks over time
Dataset 1 1 16 100 No
Dataset 2 3 30 200 No
Dataset 3 3 30 200 Yes

Table 1: Description of the datasets to test our algorithms against different scenarios

Dataset 1 Dataset 2 Dataset 3


k-NN (k=3) 75% 83.33% 22.22%
k-NN + FEATURE 100% 100% 33.33%
k-NN + TIME 75% 83.33% 83.33%
k-NN + FT + TIME 100% 100% 100%
LOGREG 87.5% 100% 22.22%

Table 2: Accuracy of each algorithm on each dataset

List of Acronyms Kirkpatrick, S. and Gelatt, J. R. (1983). Optimization by


simulated annealing.
k-NN k-Nearest Neighbors
McConnell, S. (1996). Rapid Development: Taming Wild
PM Project Manager Software Schedules. Microsoft Press, Redmond, WA,
SGD Stochastic Gradient Descent USA, 1st edition.

Project Management Institute (2004). A Guide To The


References Project Management Body Of Knowledge (PMBOK
Guides). Project Management Institute.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., and
Zhifeng Chen, e. a. (2015). TensorFlow: Large-scale Royce, W. (1970). Managing the development of large soft-
machine learning on heterogeneous systems. Software ware systems. Proceedings of IEEE WESCON.
available from tensorflow.org.
Ruder, S. (2016). An overview of gradient descent opti-
Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance- mization algorithms. Web Page, pages 1–12.
based learning algorithms. Machine Learning, 6(1):37–
66. Sutskever, I., Martens, J., Dahl, G. E., and Hinton, G. E.
(2013). On the importance of initialization and momen-
Altman, N. S. (1992). An introduction to kernel and tum in deep learning. Jmlr W&Cp, 28(2010):1139–1147.
nearest-neighbor nonparametric regression. The Amer-
ican Statistician, 46:175–185. Tahir, M. A., Bouridane, A., and Kurugollu, F. (2007). Si-
multaneous feature selection and feature weighting us-
Bellman, R. E. (2003). Dynamic Programming. Dover Pub- ing hybrid tabu search/k-nearest neighbor classifier. Pat-
lications, Incorporated. tern Recognition Letters, 28(4):438 – 446.
Bishop, C. M. (2006). Pattern Recognition and Machine The Bull Survey (1998). The bull survey. London: Spikes
Learning (Information Science and Statistics). Springer- Cavell Research Company.
Verlag New York, Inc., Secaucus, NJ, USA.
Theano Development Team (2016). Theano: A Python
Boehm, B. W., Defense, T. R. W., Group, S., Boehm, H. W.,
framework for fast computation of mathematical expres-
Defense, T. R. W., and Group, S. (1987). A Spiral Model
sions. arXiv e-prints, abs/1605.02688.
of Software Development and Enhancement. Computer
(Long. Beach. Calif)., 21(May):61–72. van der Walt, S., Colbert, S. C., and Varoquaux, G. (2011).
The numpy array: a structure for efficient numerical com-
Burch, C. (2010). Django, a web framework using python:
putation. CoRR, abs/1102.1523.
Tutorial presentation. J. Comput. Sci. Coll., 25(5):154–
155. Wilson, J. M. (2003). Gantt charts: A centenary appre-
ciation. European Journal of Operational Research,
Freedman, D. (2005). Statistical models: theory and prac-
149(2):430 – 437. Sequencing and Scheduling.
tice.
Zou, K. H., Tuncali, K., and Silverman, S. G. (2003).
Inza, I., Larrañaga, P., and Sierra, B. (2002). Feature
Correlation and simple linear regression. Radiology,
Weighting for Nearest Neighbor by Estimation of Dis-
227(3):617–628.
tribution Algorithms, pages 295–311. Springer US,
Boston, MA.

11

You might also like