0% found this document useful (0 votes)
41 views

DL Unit 1

Machine learning is a subset of artificial intelligence that enables systems to learn from data and experience without being explicitly programmed. It works by building models from sample data known as training data to make data-driven predictions or decisions. The document discusses key aspects of machine learning including how it uses data patterns to learn and improve automatically.

Uploaded by

venkata rajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

DL Unit 1

Machine learning is a subset of artificial intelligence that enables systems to learn from data and experience without being explicitly programmed. It works by building models from sample data known as training data to make data-driven predictions or decisions. The document discusses key aspects of machine learning including how it uses data patterns to learn and improve automatically.

Uploaded by

venkata rajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 27

Q) what is machine learning?

In the real world, we are surrounded by humans who can learn everything
from their experiences with their learning capability, and we have
computers or machines which work on our instructions. But can a machine
also learn from experiences or past data like a human does? So here
comes the role of Machine Learning.

Machine Learning is said as a subset of artificial intelligence that is


mainly concerned with the development of algorithms which allow a
computer to learn from the data and past experiences on their own. The
term machine learning was first introduced by Arthur Samuel in 1959. We
can define it in a summarized way as:

Machine learning enables a machine to automatically learn from data,


improve performance from experiences, and predict things without being
explicitly programmed.
Features of Machine Learning:
o Machine learning uses data to detect various patterns in a given
dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with
the huge amount of the data.

o The importance of machine learning can be easily understood by its


uses cases, Currently, machine learning is used in self-driving
cars, cyber fraud detection, face recognition, and friend
suggestion by Facebook, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are
using a vast amount of data to analyze the user interest and
recommend product accordingly.

……………… end…………

Q)What is Deep Learning?

Deep learning is a method in artificial intelligence (AI) that teaches


computers to process data in a way that is inspired by the human brain.
Deep learning models can recognize complex patterns in pictures, text,
sounds, and other data to produce accurate insights and predictions. You
can use deep learning methods to automate tasks that typically require
human intelligence, such as describing images or transcribing a sound file
into text.
Why is deep learning important?

Artificial intelligence (AI) attempts to train computers to think and learn as


humans do. Deep learning technology drives many AI applications used in
everyday products, such as the following:

 Digital assistants
 Voice-activated television remotes
 Fraud detection
 Automatic facial recognition

It is also a critical component of emerging technologies such as self-driving


cars, virtual reality, and more.
Deep learning models are computer files that data scientists have trained to
perform tasks using an algorithm or a predefined set of steps. Businesses
use deep learning models to analyze data and make predictions in various
applications.
What are the uses of deep learning?

Deep learning has several use cases in automotive, aerospace,


manufacturing, electronics, medical research, and other fields. These are
some examples of deep learning:

 Self-driving cars use deep learning models to automatically detect


road signs and pedestrians.
 Defense systems use deep learning to automatically flag areas of
interest in satellite images.
 Medical image analysis uses deep learning to automatically detect
cancer cells for medical diagnosis.
 Factories use deep learning applications to automatically detect when
people or objects are within an unsafe distance of machines.

…………………….. end…………….

Q)probabilistic modeling ?

Probabilistic models are an essential component of machine learning,


which aims to learn patterns from data and make predictions on new,
unseen data. They are statistical models that capture the inherent
uncertainty in data and incorporate it into their predictions. Probabilistic
models are used in various applications such as image and speech
recognition, natural language processing , and recommendation systems.
In recent years, significant progress has been made in developing
probabilistic models that can handle large datasets efficiently.
Categories Of Probabilistic Models:
These models can be classified into the following categories:
 Generative models
 Discriminative models.
 Graphical models

Generative models:
Generative models aim to model the joint distribution of the input and
output variables. These models generate new data based on the
probability distribution of the original dataset. Generative models are
powerful because they can generate new data that resembles the training
data. They can be used for tasks such as image and speech
synthesis, language translation, and text generation.

Discriminative models:
The discriminative model aims to model the conditional distribution of the
output variable given the input variable. They learn a decision boundary
that separates the different classes of the output variable. Discriminative
models are useful when the focus is on making accurate predictions rather
than generating new data. They can be used for tasks such as image
recognition, speech recognition, and sentiment analysis.
Graphical models:
These models use graphical representations to show the conditional
dependence between variables. They are commonly used for tasks such
as image recognition, natural language processing, and causal inference.
Deep learning, a subset of machine learning, also relies on probabilistic
models. Probabilistic models are used to optimize complex models with
many parameters, such as neural networks. By incorporating uncertainty
into the model training process, deep learning algorithms can provide
higher accuracy and generalization capabilities. One popular technique is
variational inference, which allows for efficient estimation of posterior
distributions.

Importance of Probabilistic Models:


 Probabilistic models play a crucial role in the field of machine
learning, providing a framework for understanding the underlying
patterns and complexities in massive datasets.
 Probabilistic models provide a natural way to reason about the
likelihood of different outcomes and can help us understand the
underlying structure of the data.
 Probabilistic models help enable researchers and practitioners to
make informed decisions when faced with uncertainty.
 Probabilistic models allow us to perform Bayesian inference, which
is a powerful method for updating our beliefs about a hypothesis based
on new data. This can be particularly useful in situations where we
need to make decisions under uncertainty.

Advantages Of Probabilistic Models:


 Probabilistic models are an increasingly popular method in many
fields, including artificial intelligence, finance, and healthcare.
 The main advantage of these models is their ability to take into
account uncertainty and variability in data. This allows for more
accurate predictions and decision-making, particularly in complex and
unpredictable situations.
 Probabilistic models can also provide insights into how different
factors influence outcomes and can help identify patterns and
relationships within data.

Disadvantages Of Probabilistic Models:


There are also some disadvantages to using probabilistic models.
 One of the disadvantages is the potential for overfitting, where the
model is too specific to the training data and doesn’t perform well on
new data.
 Not all data fits well into a probabilistic framework, which can limit
the usefulness of these models in certain applications.
 Another challenge is that probabilistic models can be
computationally intensive and require significant resources to develop
and implement.

…………………………………. END ……………………………….

Q)differences between Artificial Intelligence (AI) and Machine learning


(ML)?
Artificial Intelligence Machine learning

Artificial intelligence is a Machine learning is a subset of AI which


technology which enables a allows a machine to automatically learn
machine to simulate human from past data without programming
behavior. explicitly.

The goal of AI is to make a The goal of ML is to allow machines to


smart computer system like learn from data so that they can give
humans to solve complex accurate output.
problems.

In AI, we make intelligent In ML, we teach machines with data to


systems to perform any task perform a particular task and give an
like a human. accurate result.

Machine learning and deep Deep learning is a main subset of


learning are the two main machine learning.
subsets of AI.

AI has a very wide range of Machine learning has a limited scope.


scope.

AI is working to create an Machine learning is working to create


intelligent system which can machines that can perform only those
perform various complex specific tasks for which they are trained.
tasks.

AI system is concerned about Machine learning is mainly concerned


maximizing the chances of about accuracy and patterns.
success.

The main applications of AI The main applications of machine learning


are Siri, customer support are Online recommender
using catboats, Expert system, Google search
System, Online game playing, algorithms, Facebook auto friend
intelligent humanoid robot, tagging suggestions, etc.
etc.

On the basis of capabilities, Machine learning can also be divided into


AI can be divided into three mainly three types that are Supervised
types, which are, Weak learning, Unsupervised learning,
AI, General AI, and Strong and Reinforcement learning.
AI.

It includes learning, It includes learning and self-correction


reasoning, and self- when introduced with new data.
correction.

AI completely deals with Machine learning deals with Structured


Structured, semi-structured, and semi-structured data.
and unstructured data.

…………………………………. END……………..

Q) Artificial Intelligence?

What is Artificial Intelligence (AI)?

In today's world, technology is growing very fast, and we are getting in


touch with different new technologies day by day.

Here, one of the booming technologies of computer science is Artificial


Intelligence which is ready to create a new revolution in the world by
making intelligent machines.The Artificial Intelligence is now all around us.
It is currently working with a variety of subfields, ranging from general to
specific, such as self-driving cars, playing chess, proving theorems, playing
music, Painting, etc.
AI is one of the fascinating and universal fields of Computer science which
has a great scope in future. AI holds a tendency to cause a machine to
work as a human.

Artificial Intelligence is composed of two words Artificial and Intelligence,


where Artificial defines "man-made," and intelligence defines "thinking
power", hence AI means "a man-made thinking power."

So, we can define AI as:

"It is a branch of computer science by which we can create intelligent


machines which can behave like a human, think like humans, and able to
make decisions."

Artificial Intelligence exists when a machine can have human based skills
such as learning, reasoning, and solving problems

With Artificial Intelligence you do not need to preprogram a machine to do


some work, despite that you can create a machine with programmed
algorithms which can work with own intelligence, and that is the
awesomeness of AI.

It is believed that AI is not a new technology, and some people says that as
per Greek myth, there were Mechanical men in early days which can work
and behave like humans.

Why Artificial Intelligence?

Before Learning about Artificial Intelligence, we should know that what is


the importance of AI and why should we learn it. Following are some main
reasons to learn about AI:

o With the help of AI, you can create such software or devices which
can solve real-world problems very easily and with accuracy such as
health issues, marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant,
such as Cortana, Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new
Opportunities.

Goals of Artificial Intelligence

Following are the main goals of Artificial Intelligence:

1. Replicate human intelligence


2. Solve Knowledge-intensive tasks
3. An intelligent connection of perception and action
4. Building a machine which can perform tasks that requires human
intelligence such as:
o Proving a theorem
o Playing chess
o Plan some surgical operation
o Driving a car in traffic
5. Creating some system which can exhibit intelligent behavior, learn
new things by itself, demonstrate, explain, and can advise to its user.

………………………………… END …………………..

Q)kernel methods?

Kernels, also known as kernel techniques or kernel functions, are a


collection of distinct forms of pattern analysis algorithms, using a linear
classifier, they solve an existing non-linear problem. SVM (Support Vector
Machines) uses Kernels Methods in ML to solve classification and
regression issues. The SVM (Support Vector Machine) employs “Kernel
Trick” where data is processed, and an optimal boundary for the various
outputs is determined.

In other words, a kernel is a term used to describe applying linear


classifiers to non-linear problems by mapping non-linear data onto a
higher-dimensional space without having to visit or understand that higher-
dimensional region.

These are some of the many techniques of the kernel:

 Support Vector Machine (SVM)


 Adaptive Filter

 Kernel Perception

 Principle Component Analysis

 Spectral Clustering

1. Support Vector Machine (SVM):

It can be defined as a classifier for separating hyperplanes, in which


hyperplanes are subspaces with one dimension less than the ambient
space. Higher dimensions make support vector machines much more
challenging to interpret.

It’s more difficult to imagine how we can separate the data linearly and the
decision boundary. In p-dimensions, a hyperplane is a p-1 dimensional
“flat” subspace within the larger p-dimensional space. The hyperplane is
simply a line in two dimensions.

2. Adaptive Filter:

It uses a linear filter that integrates the transfer function, controlled by


several methods and parameters, which we will use to fine-tune these
parameters per the development algorithm. Every adaptive filter is a digital
filter due to the complexity of the optimization algorithm.
An adaptive filter is required for applications with no prior knowledge of the
desired performance or where the implementation changes. The cost
function is applied to a flexible closed-loop filter as needed for optimal filter
operation. It determines how to alter the filter transfer function to reduce the
cost of subsequent duplication.

3. Kernel perception:

In machine learning, it’s a variant of the popular perceptron learning


algorithm used to train kernel machines. It includes non-linear classifiers
that use a kernel function to calculate the similarity of unseen samples to
training samples.

Most of the kernel algorithms discussed are statistically based on convex


optimization or eigenproblems. Therefore, the statistical learning theory is
used to analyze their statistical properties.

Kernel methods have a wide range of applications:

 3D reconstruction
 Bioinformatics

 Geostatistics

 Chemoinformatics

 Handwriting recognition

 Inverse distance weighting

 Information extraction
4. Principle Component Analysis (PCA):

Principle component analysis is a tool used to reduce data size. It allows us


to reduce the size of the data without losing much of the information. PCA
reduces the size by obtaining a combination of orthogonal lines (key
components) for real flexibility with very large variations. The first major
component captures most of the data variability.

The second main part is orthogonal in the main part and captures the
remaining variations, the rest of the first main part, and so on. Many
principal components are uncorrelated and organized so that a few
principal components define most of the actual data variations. The kernel
principal component analysis extends PCA that uses kernel methods. In
contrast to the standard linear PCA, the kernel variant works for a large
number of attributes but becomes slow for a large number of examples.

5. Spectral clustering:

In the context of image classification, it’s known as segmentation-based


object categorization. Size reduction is performed before merging into
smaller sizes in Spectral Clustering, and this is accomplished by using the
eigenvalue matrix for data matching.

Its roots can be traced back to graph theory, where this method is used to
identify node communities on a graph depending on the edges that connect
them. This method is sufficiently adaptable to allow us to compile data from
non-graphs too.

…………………………………… END………………………

Q) Random forests?
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a


number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should
have knowledge of the Decision Tree Algorithm.

Assumptions for Random Forest

Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:

o There should be some actual values in the feature variable of the


dataset so that the classifier can predict accurate results rather than a
guessed result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?

Below are some points that explain why we should use the Random Forest
algorithm:

<="" li="">

o It takes less training time as compared to other algorithms.


o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is
missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by


combining N decision tree, and second is to make predictions for each tree
created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.


Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below


example:

Example: Suppose there is a dataset that contains multiple fruit images.


So, this dataset is given to the Random forest classifier. The dataset is
divided into subsets and given to each decision tree. During the training
phase, each decision tree produces a prediction result, and when a new
data point occurs, then based on the majority of results, the Random Forest
classifier predicts the final decision. Consider the below image:
Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the


identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of
the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest


o Random Forest is capable of performing both Classification and
Regression tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting
issue.

Disadvantages of Random Forest


o Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.

………………… end ………………

Q) Decision tree?

o Decision Tree is a Supervised learning technique that can be used


for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf
node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of
the given dataset.
o It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Note: A decision tree can contain categorical data (YES/NO) as well as
numeric data.

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best


algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for
using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a


decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because
it shows a tree-like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or more
homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

 Splitting: Splitting is the process of dividing the decision node/root node


into sub-nodes according to the given conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.

 Pruning: Pruning is the process of removing the unwanted branches


from the tree.

 Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares the
values of root attribute with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of
the dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM). The
root node splits further into the next decision node (distance from the office)
and one leaf node based on the corresponding labels. The next decision
node further gets split into one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two leaf nodes (Accepted offers
and Declined offer). Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to solve
such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:

o Information Gain
o Gini Index
1. Information Gain:
o Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a
class.
o According to the value of information gain, we split the node and build
the decision tree.
o A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information
gain is split first. It can be calculated using the below formula:

Entropy: Entropy is a metric to measure the impurity in a given attribute. It


specifies randomness in data. Entropy can be calculated as:

Entropy=Infromation Gain* probability

2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree)
algorithm.
o An attribute with the low Gini index should be preferred as compared
to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2


Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order


to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique
that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology
used:

o Cost Complexity Pruning


o Reduced Error Pruning.

Advantages of the Decision Tree


o It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other
algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
o For more class labels, the computational complexity of the decision
tree may increase.

…………… end ,…………….

Q) Gradient Boosting Machines?

Gradient boosted machines (GBMs) are an extremely popular machine

learning algorithm that have proven successful across many domains

and is one of the leading methods for winning Kaggle competitions.

Whereas random forests build an ensemble of deep independent trees,

GBMs build an ensemble of shallow and weak successive trees with


each tree learning and improving on the previous. When combined,

these many weak successive trees produce a powerful “committee”

that are often hard to beat with other algorithms.

Advantages:

 Often provides predictive accuracy that cannot be beat.


 Lots of flexibility - can optimize on different loss functions and
provides several hyperparameter tuning options that make the
function fit very flexible.
 No data pre-processing required - often works great with
categorical and numerical values as is.
 Handles missing data - imputation not required.

Disdvantages:

 GBMs will continue improving to minimize all errors. This can


overemphasize outliers and cause overfitting. Must use cross-
validation to neutralize.
 Computationally expensive - GBMs often require many trees
(>1000) which can be time and memory exhaustive.
 The high flexibility results in many parameters that interact and
influence heavily the behavior of the approach (number of
iterations, tree depth, regularization parameters, etc.). This
requires a large grid search during tuning.
 Less interpretable although this is easily addressed with various
tools (variable importance, partial dependence plots, LIME, etc.).

The main idea of boosting is to add new models to the


ensemble sequentially. At each particular iteration, a new
weak, base-learner model is trained with respect to the error of
the whole ensemble learnt so far.
Base-learning models: Boosting is a framework that iteratively
improves any weak learning model. Many gradient boosting
applications allow you to “plug in” various classes of weak learners at
your disposal. In practice however, boosted algorithms almost always
use decision trees as the base-learner. Consequently, this tutorial will
discuss boosting in the context of regression trees.

Training weak models: A weak model is one whose error rate is


only slightly better than random guessing. The idea behind boosting is
that each sequential model builds a simple weak model to slightly
improve the remaining errors. With regards to decision trees, shallow
trees represent a weak learner. Commonly, trees with only 1-6 splits
are used. Combining many weak models (versus strong ones) has a
few benefits:

 Speed: Constructing weak models is computationally cheap.


 Accuracy improvement: Weak models allow the algorithm
to learn slowly; making minor adjustments in new areas where it
does not perform well. In general, statistical approaches that
learn slowly tend to perform well.
 Avoids overfitting: Due to making only small incremental
improvements with each model in the ensemble, this allows us to
stop the learning process as soon as overfitting has been detected
(typically by using cross-validation).

……………………….. END ………………


Q) origins and history of machine learning?

1950 – this is the year when Alan Turing, one of the most brilliant and influential
British mathematicians and computer scientists, created the Turing test. The test
was designed to determine whether a computer has human-like intelligence. In order
to pass the test, the computer needs to be able to convince a human to believe that
it’s another human. Apart from a computer program simulating a 13-year-old
Ukrainian boy who is said to have passed the Turing test, there were no other
successful attempts so far.

1952 – Arthur Samuels, the American pioneer in the field of artificial intelligence and
computer gaming, wrote the very first computer learning program. That program was
actually the game of checkers. The IBM computer would first study which moves
lead to winning and then put them into its program.

1957 – this year witnessed the design of the very first neural network for computers
called the perceptron by Frank Rosenblatt. It successfully stimulated the thought
processes of the human brain. This is where today’s neural networks originate from.

1967 – The nearest neighbor algorithm was written for the first time this year. It
allows computers to start using basic pattern recognition. This algorithm can be
used to map a route for a traveling salesman that starts in a random city and
ensures that the salesman passes by all the required cities in the shortest time.
Today, the nearest neighbor algorithm called KNN is mostly used to classify a data
point on the basis of how their neighbors are classified. KNN is used in retail
applications that recognize patterns in credit card usage or for theft prevention when
implemented in CCTV image recognition in retail stores.

1981 – Gerald Dejong introduced the concept of explanation-based learning (EBL).


In this type of learning, the computer analyzes training data and generates a general
rule that it can follow by discarding the data that doesn’t seem to be important.

1985 – Terry Sejnowski invented the NetTalk program that could learn to pronounce
words just like a baby does during the process of language acquisition. The artificial
neural network aimed to reconstruct a simplified model that would show the
complexity of learning human-level cognitive tasks.
The 1990s – during the 1990s, the work in machine learning shifted from the
knowledge-driven approach to the data-driven approach. Scientists and researchers
created programs for computers that could analyze large amounts of data and draw
conclusions from the results. This led to the development of the IBM Deep Blue
computer, which won against the world’s chess champion Garry Kasparov in 1997.

2006 – this is the year when the term “deep learning” was coined by Geoffrey
Hinton. He used the term to explain a brand-new type of algorithms that allow
computers to see and distinguish objects or text in images or videos.

2010 – this year saw the introduction of Microsoft Kinect that could track even 20
human features at the rate of 30 times per second. Microsoft Kinect allowed users to
interact with machines via gestures and movements.

2011 – this was an interesting year for machine learning. For starters, IBM’s Watson
managed to beat human competitors at Jeopardy. Moreover, Google developed
Google Brain equipped with a deep neural network that could learn to discover and
categorize objects (in particular, cats).

2012 – Google X lab developed a machine learning algorithm able to autonomously


browse YouTube videos and identify those that contained cats.

2014 – Facebook introduced DeepFace, a special software algorithm able to


recognize and verify individuals on photos at the same level as humans.

2015 – this is the year when Amazon launched its own machine learning platform,
making machine learning more accessible and bringing it to the forefront of software
development. Moreover, Microsoft created the Distributed Machine Learning Toolkit,
which enables developers to efficiently distribute machine learning problems across
multiple machines. During the same year, however, more than three thousand AI
and robotics researchers endorsed by figures like Elon Musk, Stephen Hawking, and
Steve Wozniak signed an open letter warning about the dangers of autonomous
weapons that could select targets without any human intervention.

2016 – this was the year when Google’s artificial intelligence algorithms managed to
beat a professional player at the Chinese board game Go. Go is considered the
world’s most complex board game. The AlphaGo algorithm developed by Google
won five out of five games in the competition, bringing AI to the front page.

2020 – Open AI announced a groundbreaking natural language processing


algorithm GPT-3 with a remarkable ability to generate human-like text when given a
prompt. Today, GPT-3 is considered the largest and most advanced language
model in the world, using 175 billion parameters and Microsoft Azure’s AI
supercomputer for training.

You might also like