ML Notes UT-1
ML Notes UT-1
D. Support vector machines: (SVMs) Support vector machine algorithms are used to
accomplish both classification and regression tasks. These are supervised machine
learning algorithms that plot each piece of data in the n-dimensional space, with n
referring to the number of features. Each feature value is associated with a coordinate
value, making it easier to plot the features. Moreover, classification is further performed
by distinctly deter task the hyper-plane that separates the two sets of support vectors or
classes. A good separation ensures a good classification between the plotted data points.
In simple words, SVMs represent the coordinates for individual observations. These are
popular machine learning classifiers used in applications such as data classification, facial
expression classification, text classification, steganography detection in digital images,
speech recognition, and others.
E. Naive Bayes algorithm: Naive Bayes refers to a probabilistic machine learning
algorithm based on the Bayesian probability model and is used to address classification
problems. The fundamental assumption of the algorithm is that features under
consideration are independent of each other and a change in the value of one does not
impact the value of the other.
For example, you can consider a ball, a cricket ball, if it is red, round, has a 7.1-7.26 cm
diameter, and has a mass of 156-163 g. Although all these features could be
interdependent, each one contributes to the probability that it is a cricket ball. This is the
reason the algorithm is referred to as ‘naïve’. Let’s look at the mathematical
representation of the algorithm.
If X, Y = probabilistic events, P (X) = probability of X being true, P(X|Y) =
conditional probability of X being true in case Y is true. Then, Bayes’ theorem is given by
the equation: P (X|Y) = (P (Y|X) x P (X)) /P (Y)
A naive Bayesian approach is easy to develop and implement. It is capable of handling
massive datasets and is useful for making real-time predictions. Its applications include
spam filtering, sentiment analysis and prediction, document classification, and others.
F. KNN classification algorithm: The K Nearest Neighbours (KNN) algorithm is used for
both classification and regression problems. It stores all the known use cases and
classifies new use cases (or data points) by segregating them into different classes. This
classification is accomplished based on the similarity score of the recent use cases to the
available ones. KNN is a supervised machine learning algorithm, wherein ‘K’ refers to the
number of neighbouring points we consider while classifying and segregating the known
n groups. The algorithm learns at each step and iteration, thereby eliminating the need
for any specific learning phase. The classification is based on the neighbour’s majority
vote.
The algorithm uses these steps to perform the classification:
• For a training dataset, calculate the distance between the data points that are to
be classified and the rest of the data points.
• Choose the closest ‘K’ elements based on the distance or function used.
• Consider a ‘majority vote’ between the K points–the class or label dominating all
data points reveals the final ranking.
H. Random Forest algorithm: Random Forest algorithms use multiple decision trees to
handle classification and regression problems. It is a supervised machine learning
algorithm where different decision trees are built on different samples during training.
These algorithms help estimate missing data and tend to keep the accuracy intact in
situations when a large chunk of data is missing in the dataset. Random forest algorithms
follow these steps:
• Select random data samples from a given data set.
• Build a decision tree for each data sample and provide the prediction result for
each decision tree.
• Carry out voting for each expected result.
• Select the final prediction result based on the highest voted prediction result.
I. Artificial neural networks (ANNs): Artificial neural networks are machine learning
algorithms that mimic the human brain (neuronal behaviour and connections) to solve complex
problems. ANN has three or more interconnected layers in its computational model that process
the input data. The first layer is the input layer or neurons that send input data to deeper layers.
The second layer is called the hidden layer. The components of this layer change or tweak the
information received through various previous layers by performing a series of data
transformations. These are also called neural layers. The third layer is the output layer that sends
the final output data for the problem. ANN algorithms find applications in smart home and home
automation devices such as door locks, thermostats, smart speakers, lights, and appliances. They
are also used in the field of computational vision, specifically in detection systems and
autonomous vehicles.
J. Recurrent neural networks (RNNs): Recurrent neural networks refer to a specific type of
ANN that processes sequential data. Here, the result of the previous step acts as the input to the
current step. This is facilitated via the hidden state that remembers information about a sequence.
It acts as a memory that maintains the information on what was previously calculated. The
memory of RNN reduces the overall complexity of the neural network.
RNN analyses time series data and possesses the ability to store, learn, and maintain contexts
of any length. RNN is used in cases where time sequence is of paramount importance, such as
speech recognition, language translation, video frame processing, text generation, and image
captioning. Even Siri, Google Assistant, and Google Translate use the RNN architecture.
1.5 Real life applications:
1. Healthcare industry: Machine learning is being increasingly adopted in the healthcare
industry, credit to wearable devices and sensors such as wearable fitness trackers, smart
health watches, etc. All such devices monitor users’ health data to assess their health in
real-time. Moreover, the technology is helping medical practitioners in analysing trends
or flagging events that may help in improved patient diagnoses and treatment. ML
algorithms even allow medical experts to predict the lifespan of a patient suffering from
a fatal disease with increasing accuracy. Additionally, machine learning is contributing
significantly to two areas:
• Drug discovery: Manufacturing or discovering a new drug is expensive and
involves a lengthy process. Machine learning helps speed up the steps involved in
such a multi-step process. For example, Pfizer uses IBM’s Watson to analyse
massive volumes of disparate data for drug discovery.
• Personalized treatment: Drug manufacturers face the stiff challenge of
validating the effectiveness of a specific drug on a large mass of the population.
This is because the drug works only on a small group in clinical trials and possibly
causes side effects on some subjects.
To address these issues, companies like Genentech have collaborated with GNS
Healthcare to leverage machine learning and simulation AI platforms, innovating
biomedical treatments to address these issues. ML technology looks for patients’
response markers by analysing individual genes, which provides targeted therapies to
patients.
2. Finance sector: Today, several financial organizations and banks use machine
learning technology to tackle fraudulent activities and draw essential insights from vast
volumes of data. ML-derived insights aid in identifying investment opportunities that
allow investors to decide when to trade. Moreover, data task methods help cyber-
surveillance systems zero in on warning signs of fraudulent activities, subsequently
neutralizing them. Several financial institutes have already partnered with tech
companies to leverage the benefits of machine learning. For example,
• Citibank has partnered with fraud detection company Feedzai to handle online
and in-person banking frauds.
• PayPal uses several machine learning tools to differentiate between legitimate and
fraudulent transactions between buyers and sellers.
3. Retail sector: Retail websites extensively use machine learning to recommend items
based on users’ purchase history. Retailers use ML techniques to capture data, analyse it,
and deliver personalized shopping experiences to their customers. They also implement
ML for marketing campaigns, customer insights, customer merchandise planning, and
price optimization. According to a September 2021 report by Grand View Research, Inc.,
the global recommendation engine market is expected to reach a valuation of $17.30
billion by 2028. Common day-to-day examples of recommendation systems include:
• When you browse items on Amazon, the product recommendations that you see
on the homepage result from machine learning algorithms. Amazon uses artificial
neural networks (ANN) to offer intelligent, personalized recommendations
relevant to customers based on their recent purchase history, comments,
bookmarks, and other online activities.
• Netflix and YouTube rely heavily on recommendation systems to suggest shows
and videos to their users based on their viewing history.
Moreover, retail sites are also powered with virtual assistants or conversational chatbots
that leverage ML, natural language processing (NLP), and natural language
understanding (NLU) to automate customer shopping experiences.
4. Travel industry: Machine learning is playing a pivotal role in expanding the scope of
the travel industry. Rides offered by Uber, Ola, and even self-driving cars have a robust
machine learning backend. Consider Uber’s machine learning algorithm that handles the
dynamic pricing of their rides. Uber uses a machine learning model called ‘Geosurge’ to
manage dynamic pricing parameters. It uses real-time predictive modelling on traffic
patterns, supply, and demand. If you are getting late for a meeting and need to book an
Uber in a crowded area, the dynamic pricing model kicks in, and you can get an Uber ride
immediately but would need to pay twice the regular fare. Moreover, the travel industry
uses machine learning to analyse user reviews. User comments are classified through
sentiment analysis based on positive or negative scores. This is used for campaign
monitoring, brand monitoring, compliance monitoring, etc., by companies in the travel
industry.
5. social media: With machine learning, billions of users can efficiently engage on social
media networks. Machine learning is pivotal in driving social media platforms from
personalizing news feeds to delivering user-specific ads. For example, Facebook’s auto-
tagging feature employs image recognition to identify your friend’s face and tag them
automatically. The social network uses ANN to recognize familiar faces in users’ contact
lists and facilitates automated tagging. Similarly, LinkedIn knows when you should apply
for your next role, whom you need to connect with, and how your skills rank compared
to peers. All these features are enabled by machine learning.
1.6 Learning Tasks
1. Descriptive Data Task:
This term is basically used to produce correlation, cross-tabulation, frequency etc. These
technologies are used to determine the similarities in the data and to find existing
patterns. One more application of descriptive analysis is to develop the captivating
subgroups in a major part of the data available.
This analytics emphasis on the summarization and transformation of the data into
meaningful information for reporting and monitoring.
2. Predictive Data Task:
The main goal of this task is to say something about future results not of current
behaviour. It uses the supervised learning functions which are used to predict the target
value. The methods come under this type of task category are called classification, time-
series analysis, and regression. Modelling of data is the necessity of the predictive
analysis, and it works by utilizing a few variables of the present to predict the future not
known data values for other variables.
S. No Comparison Descriptive Data Task Predictive Data Task
1 Basic It determines, what happened It determines, what can happen in
in the past by analysing stored the future with the help past data
data. analysis.
2 Preciseness It provides accurate data. It produces results does not
ensure accuracy.
3 Practical analysis Standard reporting, Predictive modelling, forecasting,
methods query/drill down and ad-hoc simulation, and alerts.
reporting.
4 Require It requires data aggregation It requires statistics and
and data mining forecasting methods
5 Type of approach Reactive approach Proactive approach
6 Describe Describes the characteristics of Carry out the induction over the
the data in a target data set. current and past data so that
predictions can be made.
1.7 Learning Paradigms:
1. Supervised Learning: Supervised learning is a form of machine learning which
effectively works as concept mapping. You have an input, and you get an output. As you
feed in data and assess it, you get a function which tries to abstract the system to have
rules that probably make no sense to an actual person.
An image contains a car, and your expected output is a yes, this image doesn’t contain a
car, so you get a no. As the system learns, it can rule out non-cars with increasing
accuracy. When you teach certain concepts, you can’t directly explain them and expect
your audience to understand. You describe what makes that concept work but also give
examples. There is a level of inference that if you have this input, you get this specific
output, and they draw their own conclusions about what connects them. If you’re
teaching a kid what a dog is, you don’t explain the legs, the eyes, the ears, etc., you point
and say: “That’s a dog,” or: “That’s not a dog,” until they get it. You’re providing a task to
be done and examples of the right and wrong answer.
For our car example, this would mean we’ve codified images as containing a car or not,
and the system abstracts the pattern. The main limitation is that it relies entirely on the
training data. If all your pictures are red cars in the woods for the car set, a tomato with
green around it might be determined to be a car because there is a red shape and green
around it, or a blue car may register as a false negative. It found a correlation that fits a
pattern without fitting the pattern we wanted.
2. Unsupervised Learning: Unsupervised learning is the opposite of supervised learning
in the sense that instead of assessing the accuracy of the training, you’re assessing the
results of the process. You determine a set and supervise the process in getting the results
with supervised learning, but you feed a set and let the system classify it to determine if
a novel input is a member of a derived set or not. This type of self-organization can lead
to some interesting results. You let the system sort things out in a way that makes the
clusters make sense based on the requirements.
For instance, if someone handed you a bunch of vegetables and asked you to sort them,
and you couldn’t tell what or the purpose of the sorting was, you might go off of colour.
Potatoes, carrots, etc. all have different colours, but are the same thing, though your
sorting method might disagree. You’re providing data and letting the system make sense
of it based on some vague initial premise implied in the data. This approach is useful in
something like a medical application.
If you pass a bunch of similar cell slides to an algorithm, it might find something which
then correlates to cancer which can be explored further. While with supervised learning
we gave the system our desired categories for output, here we give the system our inputs
and let it make sense of it. You might find something that you didn’t even know to search
for as an output with this sort of system.
3. Reinforcement Learning: Reinforcement learning is where the process of learning
itself is fed back into. As your algorithm explores its environment, it learns more and tries
to accomplish some specific goal(s) it is rewarded for. The better it does at approaching
the goal, the more it is rewarded. Your algorithm reacts to the environment it is in to grow
and adapt to fit the rules of the system.
A real-life example of this is teaching a dog to walk on a leash. You don’t want the dog to
pull or fall behind, but the dog really doesn’t understand the rules. When the dog does
what it’s supposed to, you give them a reward and walk. When they don’t, they don’t get
to continue walking or similar which shows them they’ve violated the rules (ideally you
don’t unreasonably punish the dog since they don’t understand, but it’s fine to do to an
algorithm).
The dog learns the rules by trial and error and eventually knows instinctively what they
can and cannot do when you have their leash. Your algorithm is adapting to the changes
as they come in rather than based on a static input set. This makes less sense for anything
which doesn’t involve a continual process (a robot trying to walk and fighting gravity) or
a dynamic environment (a video game algorithm where there aren’t necessarily fixed
states). This learning system is based on trial and error rather than a fixed set of data.
1.8 :
After projection, for the two classes to be well separated, we would like the means to be
as far apart as possible and the examples of classes be scattered in as small a region as
possible. So, |m1 − m2| to be large and s12 + s22 to be small. Fisher’s linear discriminant is
w that maximizes
where SB = (m1 – m2)( m1 – m2)T is the between-class scatter matrix. The denominator is
the sum of scatter of examples of classes around their means after projection and can be
rewritten as
where c is some constant. Because it is the direction that is important and not the
magnitude, we can just take c = 1 and find w.
Remember that when p(x|Ci) ∼ N (μi, Σ), we have a linear discriminant where
w=Σ−1(μ1−μ2), and we see that Fisher’s linear discriminant is optimal if the classes are
normally distributed. Under the same assumption, a threshold, w0, can also be calculated
to separate the two classes. But Fisher’s linear discriminant can be used even when the
classes are not normal. We have projected the samples from d dimensions to 1and any
classification method can be used afterward. We see two-dimensional synthetic data with
two classes. As we see, and as expected, because it uses the class information, LDA
direction is superior to the PCA direction in terms of the ease of discrimination afterwards.
In the case of K > 2 classes, we want to find the matrix W such that z = WT x where z is k-
dimensional and W is d × k.
The within-class scatter matrix for Ci is
Thus, we are interested in the matrix W that maximizes
The largest eigenvectors of Sw−1SB are the solution. SB is the sum of K matrices of rank 1,
namely, (mi − m) (mi − m) T, and only K − 1 of the mare independent. Therefore, SB has a
maximum rank of K − 1 and we take k = K − 1. Thus, we define a new lower, (K − 1)
dimensional space where the discriminant is then to be constructed. Though LDA uses
class separability as its goodness criterion, any classification method can be used in this
new space for estimating the discriminants.
We see that to be able to apply LDA, SW should be invertible. If this is not the case, we can
first use PCA to get rid of singularity and then apply LDA to its result; however, we should
make sure that PCA does not reduce dimensionality so much that LDA does not have
anything left to work on.