OceanofPDF.com Machine Learning for Beginners Complete a - Declan Mellor
OceanofPDF.com Machine Learning for Beginners Complete a - Declan Mellor
decisions.
Now almost anyone can use a certain amount of data for business or other
purposes with a laptop or home computer. Data is much easier to find and
the machines for processing it have also become much more accessible.
What used to cost expensive computing power can now be done much
cheaper and faster.
The advent of cloud technology has made it easier for smaller companies to
access large data sets without the need for huge amounts of data storage.
Now machine learning has become a completely different field of computer
science, with people specializing in machine learning and data science as
their own field.
Nowadays more and more things are connected and the internet is getting
bigger. This means that access to data is increasing, but data sources are
also changing. Even people's cars have computers inside them, which
means they create data that can be interpreted while driving. The vast
majority of Americans own a mobile phone and shop the web and use apps
for navigation. People use their phones to control household appliances,
which is another potential data source. There are Fitbits and smartwatches
that allow people to track health data.
The more devices, not just computers and telephones, but all kinds of
devices connected, the greater the possibilities for collecting and studying
data. This connection of everything; smartphones, smart cars, etc. make
people nervous that they risk losing their private data. They fear that their
privacy is at stake and that someone will always look at them. But machine
learning and data analysis make our lives much easier. Finding the right
products is easier, navigating is easier, and finding new music is easier. This
is all thanks to machine learning.
Image recognition
One of the applications of machine learning models is for sorting and
classifying data. This type of classification can even be used for the
classification of images. Search engines use this kind of algorithms to
identify photos, and social media sites now use facial recognition to identify
a person in a photo before the photo is even tagged. They do this by
learning from data composed from other photos. If your social media
account can recognize your face in a new photo, it's because it created
models with data from all the other photos in your account.
Image recognition techniques require in-depth learning models. In-depth
learning models are created with an artificial neural network, which will be
discussed in more detail later in this book. Deep learning is the most
complex type of machine learning where data is filtered through several
hidden layers of nodes. They are called hidden layers because the models
are unattended, which means that the features identified by the model are
not pre-chosen by the data scientist. Usually the features are patterns that
the model identifies on its own. Functions identified in neural networks can
be quite complicated, the more complex the task, the more layers the model
will have. Image sorting models may have only two or three layers, while
self-driving cars have between one and two hundred hidden layers.
We have made great strides in this area in recent years due to the increased
availability of computing power. Imagine the computing power needed to
run thousands of data points through hundreds of stacked nodes at once.
Deep learning and artificial neural networks have become more feasible
over the past decade with the improvement of computers and the reduction
of costs for processing large amounts of data. Certainly with the advent of
the cloud, which gives data scientists access to enormous amounts of data
without using physical storage space.
There is a website called ImageNet, a great resource for data scientists
interested in photo classification and neural networks. ImageNet is a
database of images that is publicly accessible for use in machine learning.
The idea is that by making it publicly available, improving machine
learning techniques will be a collaboration with data scientists around the
world.
The ImageNet database has approximately 14 million photos in the
database, with over 21,000 possible class groups. This offers a world of
opportunities for data scientists to access and classify photos to learn and
experiment with neural networks.
Every year, ImageNet hosts a competition for data scientists worldwide to
create new image classification models. The competition gets tougher every
year. Now they are starting to move to classifying videos rather than
images, which means that the complexity and required processing power
will continue to grow exponentially. Using the millions of photos in the
database, the ImageNet competition has made groundbreaking advances in
image recognition in recent years.
Modern photo classification models require methods that can be classified
very specifically. Even if two images have to be placed in the same
category, they can look very different. How do you make a model that can
distinguish them?
Take these two different photos of trees, for example. Ideally, if you were to
create a neural network model that classified images of trees, you would
want your model to categorize both as photos of trees. A person can
recognize that these are both photos of trees, but the characteristics of the
photo would make it very difficult to classify them with a machine learning
model.
The less differences the variables have, the easier they can be classified. If
all your photos of trees looked like the image on the left, with the tree in
full view with all its features, the model would be easier to make.
Unfortunately, this would lead to overfitting and when the model is
presented with data with photos like the right, your model may not classify
it correctly. We want our model to be able to classify our data even if they
are not that easy to classify.
Incredibly, ImageNet has been able to create models that classify data with
many variables and very similar data. Recently, they have created image
recognition that can even identify and categorize photos with different dog
breeds. Imagine all the variables and similarities that the model would need
to recognize in order to properly see the difference between dog breeds.
The challenge of identifying similarities between a class is known as intra-
class variability. If we have an image of a tree stump and a photo of a tree
outlined in a field, we are dealing with variability within the class. This
problem is how variables within the
the same class can differ, making it more difficult for our model to predict
which category they will fall into. Most importantly, a lot of data is needed
over time to improve the model and make it accurate.
To have an accurate model despite high levels of variability within the
class, we will have to use additional techniques with our neural network
models to find patterns between images. One method involves the use of
convolutional neural networks. Instead of just having one model or
algorithm, data is fed through several models stacked on top of each other.
The neural networks convert image features into numerical values to sort
them.
Unfortunately, it would be beyond the scope of this book to try to
understand the way these deep neural networks work, but there are many
books available that deal with these types of models and also include more
extensive explanations of the coding required to perform this type of
analysis. .
Voice recognition
Improvements in artificial intelligence have made speech recognition more
recently very useful. Most of our smartphones now have a certain level of
speech recognition, which means machine learning. Speech recognition
takes the audio data we give it and turns it into text that can be interpreted.
The difficulty of speech recognition is the irregularities in the way people
speak. Such as variability within the class. You and I may have different
accents and different inflections that are difficult to explain when teaching a
computer how to understand the human voice. If we both say the same
word with different accents, how do we teach the model to understand us?
Speech recognition also uses neural networks to interpret data, such as
image recognition. This is because the patterns in audio data are unlikely to
be recognizable to a human. Data scientists use sampling to interpret data
and make accurate predictions, despite the differences in people's voices.
Sampling is done by measuring the height and length of the sound waves,
which believe it may or may not be used to decipher what the user is
saying. The recorded audio is converted into the wave map of frequencies.
Those frequencies are
measured by numerical values and then passed through the hidden layers of
the neural networks to search for patterns.
Medicine and medical diagnosis
Machine learning is not only useful for digital marketing or to get
computers to respond to your requests. It also has the potential to improve
the medical field, especially in the diagnosis of patients using data from
previous patients.
With as much potential as machine learning has for medical diagnosis, it
can be challenging to find patient data available for machine learning due to
patient privacy laws. It is gradually gaining acceptance in the field of
medicine, meaning data is becoming available to data scientists.
Unfortunately, until now it has been difficult to have enough meaningful
data to make models related to medical diagnosis. But the technology is
there and available to use.
Machine learning can use image recognition to diagnose X-rays by bringing
data from different patients to imaging scans to make predictions about new
patients. Clustering and classification can be used to categorize different
types of the same disease so that patients and medical professionals can
better understand the variation of the same disease between two patients
and their survival rate.
Medical diagnosis with machine learning can reduce doctors' diagnosis
errors or give doctors something to provide them with a second opinion. It
can also be used to predict the likelihood of a positive diagnosis based on
patient factors and disease characteristics. One day, medical professionals
may be able to view data from thousands of patients about a particular
disease to make a new diagnosis.
But medical diagnosis is just one of many ways machine learning can be
used in medicine. Medical datasets remain small today, and the science of
machine learning still has a lot of untapped potential in the field of
medicine.
Stock forecasts
Stock traders look at many variables to decide what to do with a stock
whether they want to buy or sell or wait. They look at certain features of a
stock and trends in the market environment to make an informed estimate
of what to do. It has been done this way for years. Brokers and traders had
to do manual research to make the best estimate.
Machine learning can now be used to do the same, except that machine
learning can be done much faster and more efficiently. To be an effective
trader you need to be able to analyze trends in real time so you don't miss
out on opportunities. Machine learning can help traders find agreements
between stocks to make financial decisions using statistical data.
Traders can use linear regression models to study data on trends in past
stock prices and what variables cause a stock price to go up and down. They
can use these regressions to decide what to do with a stock.
Traders who want to analyze the performance of stocks often do this by
using a so-called support vector machine. A supporting vector machine is a
classification model in which data points are separated by a boundary line,
with a category on one side and another category or another. Traders will
use support vector machines to classify which stocks to buy and which
stocks to sell. Using certain variables that should be indicative of the
performance of a particular stock, that stock is placed on the side of the
boundary line that indicates whether the price is likely to rise or fall.
Deep learning is also often used when making stock models. The hidden
layers of a neural network can be helpful in identifying unseen trends or
characteristics of a stock that could cause them to rise or fall in price.
There is no such thing as a certain bet or a risk-free investment. This was
true when people made decisions, and it is still true when we use data
science to make financial predictions. It is important to remember that
investing in the stock market will always be risky. It is impossible to create
a model that predicts anything reliable about the stock market. It is wild and
unpredictable. But we have already learned that machine learning can find
patterns that people may not be able to find on their own.
If you understand that stock market trends can be completely arbitrary and
unpredictable, it is helpful to have another model that allows you to
estimate the predictability of stocks. Knowing how accurate your
predictions are for a given stock is just as important as the predictions
themselves. Create a separate model to measure the predictability of a
particular stock so you know how reliable your predictions are. Different
stocks have different predictability levels. It is important to illustrate that
with your model so that you can choose from the most reliable predictions.
Traders will proceed with the final decision on whether or not a stock will
go up or down in value. But data science and machine learning can
streamline the information analysis process that will aid the decision-
making process. That's why you see more and more examples of machine
learning models used when predicting inventory, and why at least
familiarize yourself with the idea.
Learning associations
Marketers in all areas, from brick and mortar stores to online stores, are
always looking for ways to connect products and increase sales. Whether
you own a small bike shop or a huge online warehouse, finding patterns in
your customer's buying behavior will help you make proactive decisions to
boost sales and make more money.
Most of us will visit a supermarket during a certain week. Supermarkets are
a perfect example of using product positioning to generate sales. Each
supermarket will organize itself so that similar items are placed together.
Baked goods have their own aisle, while fruits and vegetables have their
place. They do this for two reasons; it makes it easier for the shopper to find
what he needs and improves the customer experience. Product positioning
can also help put customers in touch with products that they want to buy but
weren't looking for when they first walked into the store.
In addition to placing the vegetables in the same aisle, there is yet another
strategy that supermarkets can implement to lead customers to certain
products. They can derive characteristics from a customer buying a specific
product and use it to recommend other unrelated products. For example,
you can assume that someone who buys fresh vegetables from the vegetable
aisle will eat healthier. You can put vegetable smoothies in the same
refrigerator where you store fruit. If a customer is looking for craft beer,
you can tempt him with a snack and place the kettle chips on the same
island as 12 packs of light beer.
If all that makes sense to you, you're on your way to understanding a
technique called collaborative filtering. It is a machine learning technique
widely used in internet marketing. If your search data shows that you have
visited airline tickets to Cancun, you may see swimwear advertisements in
your browser.
Marketing data scientists always try to answer this question; how can we
use data to find a way to link a product to its target group? The point is to
use data to link two otherwise unrelated products together to drive sales.
It is a way of making recommendations to a customer based on what you
know about him. Machine learning can often find similarities or buying
patterns with customers we may not have been looking for. This is a
powerful marketing tool that is starting to emerge in modern times.
Previously, most marketing agencies had to use intuition to find their target
markets. Data scientists can now use quantitative data to paint a more
accurate picture of their ideal customer. If you are interested in using
machine learning in digital marketing, this is a subject you should know.
Collaborative filtering is different from just promoting a similar product to
a customer. You make predictions about a customer's taste or preferences
based on data you have collected from other customers. You base this
prediction on a correlation you found between two products, and then a
measure of the likelihood that product Y will be bought with product X.
You use these estimates to decide what to market and to whom.
Spotify uses a similar process when making song recommendations. It uses
data from all the music that you liked over time. If there is a connection
between two artists, which means that many people have both artists in their
library, the model can predict the probability that you will like the other
artist.
The more products you have in your store, the more intensive it will be to
find these correlations. In a perfect world, you will look for correlations
between every different combination of products you have in your store.
This method of finding the probability that you like one product based on
buying another product is called the Apiori algorithm. There are three
criteria that must be met to confirm that there is a link between the two
products and that you must somehow link them in your store. The first
criterion is support. This gives you a measure of the popularity of a specific
product. Of all your transactions, how often does this item appear in
people's shopping cart?
The second part is confidence in the correlation between the two products.
How likely are customers to buy a Y product when they buy an X product?
After all, what is the lift of product Y? In other words, how likely someone
is to buy Y with X based on the popularity of Y alone.
The model can also use data from things like purchases, social media
engagements, etc. to make a prediction on the type of product you like. This
sets it apart as machine learning rather than just data analysis because the
model was looking for similarities, but the programmer didn't ask for a
specific output. Perhaps there are certain features or characteristics of the
group that the programmer is not even aware of. Perhaps unsupervised
machine data tells that there is a high correlation between the two types of
customers. These correlations take place all around us with similarities
between groups of people. Sometimes a good computer model is needed to
recognize the patterns in the data. Machine learning can find similarities
that would be impossible to see without the help of computers and good
models.
Data scientists in marketing industries are already using metrics to improve
their stores online, and if you want to track online retailing, it's a good idea
to start by reading about how data can help you identify similarities and
trends across products, with machine learning as your tool.
Finance
The financial sector is seeing an increase in the use of machine learning.
The use of data science and machine learning models makes the decision-
making process faster and more efficient for financial institutions. The
possibilities and applications of machine learning can be misunderstood,
which means that it is often underutilized or misused in the financial sector.
Work that was once tedious and required hundreds of hours of human work
can now be done by a computer in minutes. A well-known example is the
use of machine learning for call center and customer service work. Many of
the tasks that once required a human operator can now be accomplished
over the phone with a robot designed with machine learning.
In addition to customer service, banks can now process and analyze
contracts and financial information from thousands of customers that would
otherwise be labor intensive - used to prepare credit reports and predict the
likelihood of a customer defaulting on a loan. Machine learning techniques
can view a history of a borrower's transactions before the bank decides
whether to lend money to that person.
Machine learning is also used in fraud prevention. It has made the financial
sector safer. Machine learning has improved the bank's ability to detect
patterns in transactions indicative of fraud. Rather than being assigned
people to track the transaction and look for signs of fraud, machine learning
models can learn from fraud data to find patterns by automatically
searching millions of customer transactions.
Spam detection
A well-known example of a relatively simple machine learning tool is spam
detection. If we use guided learning and define the variables that are
relevant, the model will have certain characteristics to look for in received
email messages. The model may search for certain keywords or phrases to
detect whether an email is spam or not. Words like "buy" or "save" can let
your inbox know when you receive spam email. The problem with this
method is that in many cases there is not always spam. There may be other
keywords or phrases that we would overlook.
This is where strengthening learning comes in handy. There are so many
features that can be an indication of spam email, and some of them we
might not even be able to explain. Reinforcement learning will make it
possible to find these patterns independently, without explicit guidance.
Instead, we tell the model when it has correctly identified spam. Sometimes
we find in our inbox an email message that the model has not classified as
spam, so we manually move it to our spam folder. Now the model knows
that this message is spam, and this piece of data is added to the model to
improve the forecast next time. So over time, the machine gets better
In the top right target, the model suffers from high variance. You can see
that our data points are grouped around the bullseye. Unfortunately, the
average distance between the predicted values and the bullseye is high due
to the large variance.
In the lower left target, the model did not suffer much from high variance.
The average distance between the predicted data points is low, but they are
not clustered around the bullseye, but slightly off due to high bias. This is
likely due to insufficient training data, which means that the model will not
perform well when introduced into new data.
The bottom right model has both a high variance and a high bias. In this
worst case scenario, the model is very inaccurate because the average
distance between predicted data points and the actual value is high and the
predicted data points are skewed.
Variance can be caused by a significant degree of correlation between
variables. Using too many independent variables can also be a cause of the
high variance. Sometimes, if the variance is too great, we can combat that
by allowing a small amount of bias in the model. This is known as
regularization. We will discuss that a little later.
In statistics, the population is the group of people or dataset you are trying
to analyze. The sample is the subgroup of that population, whose data you
use to create your model. The parameters are characteristic of the
population variables you are trying to identify and make predictions in your
model.
Descriptive statistics are the use of data to study a population. Descriptive
statistics typically include the mean or average, mode, media, size,
correlation. Machine learning falls into the category of inferential statistics
because we use the data to find patterns and relationships, but also to make
predictions based on this information. Inferential statistics or descriptive
statistics use the characteristics of your population to make predictions.
This is where your regression models and classification models come in.
When we derive something, we make a logical conclusion about a
population and the knowledge we gain.
If you look at data, also pay attention to the distribution. This is how the
data is distributed in our chart. It shows the frequency of values of our data
and how they appear in combination with each other.
We use our variance to find the standard deviation. Standard deviation is the
average of the distances between the predicted data points and the real data
points on a regression or prediction model.
We also need to make sure that we are aware of models that suffer from
over- and underfitting. An overfitted model is good at predicting results
using the training data, but if you are introducing new data, it will be
difficult. It is like a model that remembers instead of learns. It can happen if
you don't use random data in your training sample.
Underfitting describes a model that is too simple and does not investigate
significant data patterns. It may predict well, but the variables and
parameters are not specific enough to give us meaningful insights if you
don't have enough training data, your model may not fit.
One of the most common mistakes when looking at data is confusing
correlation with causality. If I told you that every person who killed last
year laid eggs every week, I couldn't claim that people who buy eggs are
murderers. Maybe I look at my data and see an increase in the number of
people buying milk as well as an increase in teenage pregnancies. Could I
argue that there is a connection between people who drink a lot of milk and
teenage pregnancies? Or teens who got pregnant caused people to buy more
milk.
This is the difference between correlation and causation. Sometimes the
data shows trends that appear to be related. When two events are correlated,
it means that they seem to have a relationship because they move along the
graph on a similar trajectory and in a similar space over time. While
causality means that the relationship between the two events causes one
event that causes another.
A number of criteria must be met to suggest that two cases are causally
related. The first is covariation. The causal variable, and the event, should
have created the need to be covariant, meaning that a change in one leads to
a change in the other.
The second criterion to be met is that the causative event must occur before
the event that it should have caused. For an event to be considered causal, it
must first come.
Third, the data scientist should check for external factors. To make it clear
that one causes the other; you must be able to demonstrate that the other
variables of the event are not the real cause. If the causal variable still
creates the effect even when other variables are considered, you can argue
that there is a causal relationship.
Choosing the right type of model for
machine learning
Imagine yourself as a carpenter. What kind of tools do you think you have
loaded in your truck when you arrive at a workplace? You will likely have a
hammer and a drill, as well as a few different types of saws. You probably
have a few planes and a good set of drills. If you know how to do your job,
you will know when you will know the purpose of each of these tools and
when to use them all. Each of these tools has a specific purpose. The drill
cannot do the work of a hammer, nor would you attempt to cut anything
with a hammer.
A data scientist who wants to do machine learning has their own set of
tools, each with a different purpose and designed for a different function.
Depending on the type of data you use and what you want to know, you
have to choose different algorithmic models to do the job.
Statistical algorithms can serve different purposes. Some predict a value; as
a regression model that predicts your income base based on your years of
education and work experience. Some models predict the probability of an
event, such as a medical model that predicts the probability that a patient
will survive a year or two, etc. Other models sort things by placing them in
different categories or classes, such as sorting photo recognition software
photo of different types of dogs.
Depending on the result you are looking for, you will need your statistical
tool belt. You must familiarize yourself with the technical skills of statistics.
You also need to know which tool to use and when to use it. Here I have
made an extensive list of the different types of statistical models that are
common in machine learning. To be able to write the code to build these
models yourself, I recommend that you take some time to study in the
programming language you have chosen. But this list gives you an
introductory understanding of each type of model and when they are useful.
In order for machine learning to be effective, you need to choose the right
model and the model that works best and have relevant data for the model
and demand.
Today, especially when using the Internet and digital marketing, there are
certain questions that cannot be properly understood without the use of data
and machine learning that can analyze it. Machine learning and data science
allow you to track your customers and their buying habits so you can better
adapt to their needs as they change.
The better you interpret your data, the easier it is to identify trends and
patterns, so you can anticipate the next change.
Machine learning can be divided into three different categories, each with
different unique algorithms serving different purposes. For starters, we'll
talk about the differences between guided, uncontrolled, and reinforcing
learning.
Learning under supervision
You can see that the values of X and Y don't create a perfect line, but there
is a trend. We can use that trend to make predictions about future values of
Y. So we create a multilinear regression and end with a line going through
the center of our data points. This is called the best fit line and it is how we
will predict our Y when we get new X values in the future.
The difference here is that instead of writing m for slope, we wrote β. This
comparison is much the same as if I had written
Y = b + m1X1 + m2x2 + m3x3
Except now we have labels and we know what our X's and our Y's are. If
you see a multi-line equation in the future, it will most likely be written in
this form. Our β is what we call a parameter. It is like a magic number that
tells us what effect the value of our X has on the Y. Each independent
variable has a unique parameter. We find the parameters by making a
regression model. Over time, with machine learning, our model will be
exposed to more and more data, improving our parameter and making our
model more accurate.
We can make the model by using training data with the actual price of New
York City apartments and the actual input variables of square meters,
distance to transportation and many rooms. Our model 'learns' to
approximate the price based on real data. Then, when we connect the
independent variable for an apartment with an unknown price, our model
can predict what the price will be.
This is guided learning using a linear regression model. It is checked
because we tell the model what answer we want it to give us; the price of
apartments in New York City. It learns how to more accurately predict the
price as it gets more data and we continue to evaluate its accuracy.
Ordinary least squares OLS will try to find a regression line that minimizes
the sum of squared errors
Polynomial regression. Our next type of regression is called a polynomial
regression. In the last two types of regression, our models created a straight
line. This is because the relationship between our X and Y is linear, which
means that the effect X has on Y does not change if the value of X changes.
In polynomial regressions, our model results in a line with a curve.
If we tried to use linear regression to fit a graph that has nonlinear features,
we would do badly to make the best fit line. Take the graph on the left, for
example; the scatter plot has an upward trend as before, but with a curve. In
this case, a straight line does not work. Instead, with a polynomial
regression we will make a line with a curve corresponding to the curve in
our data, like the graph on the right
The equation of a polynomial will look like the linear equation, with the
difference that there will be a polynomial expression on one or more of our
X values. For example:
Y = mX2 + b
The effect X has on Y changes exponentially as the value for X changes.
Vector regression support. This is another important tool for data scientists
and one that you should familiarize yourself with. It is most commonly used
in case of classification. The idea here is to find a line through a space that
separates data points into different classes. Vector regression support is
another type of guided learning. It is also used for regression analysis. It is a
type of binary classification technique that is not related to probability.
To support Vector Regression, all your training data falls into one category
or the other. You want to know in which category a new data point falls.
Your data is separated by a hyperplane into these two classes. When
creating a model for
degree,
followed by the decision modes which all lead to a decision not to hire or
award an application. You can see from this decision tree why this type of
model is so specific to the specific data you are working with, because each
dataset has different qualifications and thus each dataset will be sorted
differently.
Random forests
Using only one decision tree on your model can limit the categories in which
the data is split and the outcome of the decisions. Because the decision trees
are "greedy", this means that certain categories are chosen to sort, which
means that other categories cannot be chosen either. But there is an easy way
around that. One way to diversify your decision trees and improve the
accuracy of your model is to use random forests.
If a real forest consists of several trees, that is exactly what a random forest
is. Instead of having just one decision tree, split the data into several
decision trees. If you only have one tree, models can often suffer from large
variance. Creating a random forest is a way to combat that in your model. It
is one of the best tools available for data mining. Any forest is as close as
possible to a prepackaged algorithm for data mining purposes.
In a random forest, all trees work together. The overall result of all trees is
usually correct, even if a few trees end with poor predictions. To make the
final forecast, the results of all trees are added together. Using votes from the
mean values of all trees gives us a definitive prediction.
Since we use comparable data, there is a risk of correlation between the trees
if they all try to do the same. If we use trees that are less correlated, the
model performs better.
Imagine if we bet on a coin flip. We have a hundred dollars each and there
are three choices. I can flip the coin once and the winner of that toss can
keep $ 100. Or I can flip the coin ten times and we bet ten dollars every
time. The third option is to flip the coin 100 times and bet a dollar on every
roll. The real expected result of any version of this game is the same. But if
you've done 100 tosses, you're less likely to lose all your money than if
you've only done one toss. Data scientists call this method bootstrapping. It
is the machine-student equivalent of diversifying a stock portfolio. We want
to have a model that gives us an accurate prediction. The more we split our
decision trees, the more accurate our data will be. But it is important that the
individual trees have little correlation with each other. The trees in the forest
must be diverse.
How do we prevent correlation in a random forest? Each tree first takes a
random sample from the dataset, so that each tree has a slightly different set
of data from each other. The tree chooses a trait that creates the most
separation between nodes, in a greedy process, just like individual trees.
However, in any forest, trees can only choose certain elements from the
general group of elements, so each tree is separated by different elements.
So the trees are not correlated because they use different functions to make
decisions about classification. In any forest, it is best to use at least 100 trees
to get an accurate view of the data, depending on the dataset you are
working with. In general, the more trees you have, the less your model will
become overfit. Random learning in a forest is called a "low-supervision
technique" because our outcome has been chosen and we can see the sorting
method, but it is up to each tree to categorize and separate variables by
attributes.
Classification models will tell us in which category something falls. The
categories are initially defined by the programmer. An example of a
classification model that any forest could use would be a model that
determines whether incoming emails should place spam in your "inbox" or
"spam" folder.
To create the model, we create two categories that our Y can fall into; spam
and not spam. We program the model to search for keywords or a specific
email address that may indicate spam. The presence of words such as "buy"
or "offer" will help the model determine whether the email falls into the
spam category or not. The algorithm records data and learns over time by
comparing its predictions to the actual value of the output. Over time, it
makes minor adjustments to its model, making the algorithm more efficient
over time.
Classifications
A few times in this book we have referred to classification models. Some of
the models we have already mentioned can be classified, but the following
are more supervised learning models used specifically for classification.
Classification requires labeled data and creates discontinuous predictions.
The graphs are non-linear for classification problems. There can be two
classes in a classification problem, or even more. Classification models are
probably the most widely used part of machine learning and data science.
The first type of classification is binary classification. In binary
classification, the data is classified into two categories, labeled 1 or 0. We
call it binary classification because there are only two possible categories,
and all of our data falls into one or the other.
But there are cases where we have more than two categories and for this we
use multi-class classification models. We also have linear decision
boundaries, separating data on either side of a line. Not all data can be
classified on both sides of a decision boundary.
The first image shows an example of a classification with a linear decision
boundary. In the second image, there are almost two classes, but they
cannot be separated linearly. In the third image, data points are mixed and
linear boundary classification is not possible. Depending on the type of data
you use, there are different model choices that are better suited for different
tasks
Logistic regression / classification
This method is used to classify dependent and categorical variables.
Logistic regression calculates probabilities based on independent variables.
It values the variables "Yes or No" to sort them. Usually used with binary
classification.
If you cannot separate the data into classes using a linear boundary, as in the
examples above, this is the method to use. It is one of the most common
types of machine learning algorithms. Not only does it sort into categories,
but it also tells us the probability that a category exists.
There are a few factors to consider when choosing the value for k. The
higher the number for k, the closer we get to the true classification of our
new data point. There is an optimal point where the value of k must stop
rising to avoid overfitting.
If you choose to use a number for K that is too low, chances are your model
will have a high degree of bias. If you use too high a number, the
computational power required to calculate the value is too expensive. You
may consider using an odd number when choosing a value for K, rather
than an even number. Using an odd number is less likely to encounter a
draw between classes that vote for a data point. Data scientists often choose
the number 5 as the default for k.
Using a large number for K will be very data intensive. Large datasets are
also difficult to use with KNN machine learning models. If you are using
larger data sets, you need to calculate the distance between hundreds or
perhaps thousands of data points. It also does not perform well when you
use this method on a model that exists in more than two dimensions. Again,
it has to do with the computing power required to calculate this distance
between many data points.
Vector support
Support vector is another type of classifier. It classifies using a hyperplane.
In general, we use a support vector model with smaller data sets, where it
performs reasonably well.
Kernel Support vector
While we'll touch kernel support vectors, they are later used to sort classes
that cannot be separated with a linear divider. The dividing line can take
many forms (linear, nonlinear, polynomial, radial, sigmoid).
Remind the classification of a linear boundary we just talked about, with the
classification looking something like the following image:
So we can classify new data points using Bayes' theorem. The way it works
is when we get a new data point, we calculate the probability that that data
point falls into a category based on the characteristics of that data point.
Learning without supervision
Unattended machine learning uses untagged data. Data scientists do not yet
know the output. The algorithm must discover patterns itself, where
patterns would otherwise be unknown. Find a structure in a place where the
structure is otherwise imperceptible. The algorithm independently finds
data segments. The model looks for patterns and structure in an otherwise
unlabeled and unrecognizable mass of data. Unsupervised learning allows
us to find patterns that would not be observable without computer scientists.
Sometimes huge data sets have patterns and it is impossible to search them
all to find trends.
This is good for researching consumer buying behavior so that you can
group customers into categories based on patterns in their behavior. The
model may discover that there are similarities in buying patterns between
different subsets of a market, but if you did not have your model to search
through these vast amounts of complicated data, you will never realize the
nature of these patterns. The beauty of unsupervised learning is the ability
to discover patterns or features in vast amounts of data that you wouldn't be
able to identify without the help of your model.
A good example of unsupervised learning is fraud detection. Fraud can be a
major problem for financial companies, and with large amounts of daily
users, it can be difficult for companies to identify fraud without the help of
machine learning tools. Models can learn to recognize fraud as tactics
change with technology. To tackle new, unknown fraud techniques, you
must use a model that can detect fraud under unique circumstances.
It is better to have more data when detecting fraud. Fraud detection services
should use a range of machine learning models to effectively fight fraud.
Use of both supervised and non-supervised models. It is estimated that
approximately $ 32 billion in fraudulent credit card activity will take place
next year, by 2020 fraud detection models will classify the output (credit
card transactions) as legitimate or fraudulent.
They can be classified based on a function such as the time of day or the
location of the purchase. If a trader usually sells around $ 20 and suddenly
has a sale for $ 8000 from a foreign location, the model will most likely
classify this trade as fraudulent.
The challenge of using machine learning for fraud detection is that most
transactions are not fraudulent. If there were even a significant number of
fraudulent transactions among non-fraudulent transactions, credit cards
would not be a viable sector. The percentage of fraudulent card transactions
is so small that it can create models that are skewed that way. The $ 8,000
purchase in a strange location is suspect, but is the result of a traveling
cardholder rather than fraudulent activity. Unattended learning makes it
easier to identify suspicious buying patterns such as strange shipping
locations and random jumps in user reviews.
Clustering
Clustering is a subgroup of unsupervised learning. Clustering is the task of
grouping similar things. When we use clustering, we can identify attributes
and sort our data based on those attributes. When we use machine learning
for marketing, clustering can help us identify agreements in groups of
prospect customers. Unsupervised learning can help us sort customers into
categories that we may not have created using machine learning. It can also
help you sort your data when working with a large number of variables.
K-means clustering
K-means clustering works the same way as K-nearest neighbors. You
choose a number before k to decide how many groups you want to see. You
continue to cluster and repeat until clusters are more clearly classified.
Your data is grouped around centroids, which are the points in your chart
that you chose where you want your data clustered. You choose them
randomly and you have k. Once you have introduced your data into the
model, data points are placed in categories indicated by the nearest center of
gravity, which is measured by the Euclidean distance. Then take the average
value of the data points around each center of gravity. Keep repeating this
process until your results stay the same and you have consistent clusters.
Each data point is assigned to only one cluster.
You repeat this process by finding the mean values for x and y within each
cluster. This will help you extrapolate the mean value of the data points in
each cluster. K-means clustering can help you identify previously unknown
or overlooked patterns in the data.
Choose the value for k that is optimal for the number of categories you
want to create. Ideally, you should have more than 3. The benefit associated
with adding more clusters decreases as the number of clusters increases.
The higher the value for k you choose, the smaller and more specific the
clusters are. You wouldn't want to use a value for k equal to the number of
data points, because each data point would end up in its own cluster.
You should know your dataset well and use your intuition to guess how
many clusters are suitable and what kind of differences there will be.
However, our intuition and knowledge of the data is less helpful if we have
more than just a few potential groups.
Dimensionality reduction
When you use dimension reduction, you shrink data to remove unwanted
functions. Simply put, you reduce the number of variables in a data set.
If we have many variables in our model, we run the risk of having
dimensionality problems. Dimensionality problems are problems that are
unique to models with large data sets and that can affect the accuracy of the
prediction. If we have many variables, we need larger populations and
sample populations to make our model. With so many variables, it is
difficult to have enough data to have many possible combinations to make a
well-fitting model.
If we use too many variables, we may also encounter overfitting.
Overfitting is the main problem that would make a data scientist think about
dimensional reduction.
We must choose data that we do not need or that are not relevant. If we
have a model that predicts someone's income, do we need a variable that
tells us what their favorite color is? Probably not. We can remove it from
our dataset. Usually it is not so easy to determine when to delete a variable.
There are some tools we can use to determine which variables are not so
important.
Principle Component Analysis is a method to reduce the dimension. We
take the old set of variables and somehow convert them into a newer set.
The new sets we created are called main components. There is a tradeoff
between reducing the number of variables while maintaining the accuracy
of your model.
We can also standardize the values of our variables. Make sure that they are
all valued on the same relative scale so that you don't blow up the
importance of a variable. For example, if we measured variables as a
probability between 0 and 1 versus variables measured with whole numbers
above 100.
Linear discrimination is another method of diminishing the dimension
where we combine attributes or variables, rather than removing them
altogether.
Kernel Principal Component is the third method of reducing dimensionality.
Here variables are placed in a new set. This model will be non-linear and it
will give us even better insight into the real parameters than original data.
Neural networks
The best use of these neural networks would be a task that would be easy
for a human, but extremely difficult for a computer. Think of the beginning
of the book when we talked about reasoning and inductive reasoning. Our
human brain is a powerful tool for inductive reasoning; it is our advantage
over advanced computers that can calculate a large amount of data in
seconds. We model neural networks after human thinking because we try to
teach a computer how to 'reason' like a human. This is quite a challenge. A
good example of a neural network is the example we mentioned, we apply
neural networks for tasks that would be very easy for a human, but very
challenging for a computer.
Neural networks can take a huge amount of computing power. The first
reason neural networks are a challenge to process is because of the amount
of data sets needed to create an accurate model. If you want the model to
learn how to sort photos, there are many subtle differences between photos
that the model must learn to complete the task effectively. This leads to the
next challenge, namely the number of variables required for a neural
network to function properly. The more data you use and the greater the
number of analyzed variables, meaning there is an increase in hidden
networks. At any given time, hundreds or even thousands of features are
analyzed and classified through the model. Take self-driving cars as an
example. Self-driving cars have more than 150 nodes for sorting. This
means that the amount of processing power a self-driving car needs to make
decisions in a fraction of a second, while analyzing thousands of inputs
simultaneously, is quite large.
When sorting photos, neural networks can be very helpful, and the methods
used by data scientists are improving quickly. If I showed you a picture of a
dog and a picture of a cat, you could easily tell me which was a cat and
which was a dog. But a computer requires advanced neural networks and a
large amount of data to learn the model.
A common problem with neural networks is overfitting. The model can
predict the values for the training data, but when exposed to unknown data
it is too specific for the old data and cannot make general predictions for
new data.
Suppose a math test is on the way and you want to study. You can
remember any formulas that you think will appear on the test and hope that
when the test day comes, you can simply connect the new information to
what you have already remembered. Or you can study deeper; learn how
each formula works so you can get good results even when conditions
change. An overfitted model is like memorizing the formulas for a test. It
does well if the new data is similar, but if there is a variation it doesn't know
how to adjust. You can usually tell if your model is overloaded if it
performs well with training data but poorly with test data.
When we check the performance of our model, we can measure it by its
cost value. The cost value is the difference between the predicted value and
the actual value of our model.
One of the challenges with neural networks is that there is no way to
determine the relationship between specific inputs and the output. The
hidden layers are called hidden layers for a reason; they are too difficult to
interpret or understand.
The most simplistic type of neural network is called a perceptron. It derives
its simplicity from the fact that it has only one layer through which data
passes. The input layer leads to one classifying hidden layer and the
resulting prediction is a binary classification. Remember that when we call
a classification technique as binary it means that it only sorts between two
different classes, represented by 0 and 1.
The perceptron was first developed by Frank Rosenblatt. It is a good idea to
familiarize yourself with the perceptron if you want to learn more about
neural networks. The perceptron uses the same process as other neural
network models, but you usually work with more layers and more possible
outputs. When data is received, the perceptron multiplies the input by the
weight it receives. Then the sum of all these values is connected to the
activation function. The activation function tells the input what category it
falls into, in other words it predicts the output.
If you looked at the perceptron in a graph, the line would look like this:
The line of the perception graph appears as a step, with two values, one on
each side of the 1. These two sides of the step are the different classes that
the model will predict based on the input. As you can see from the graph, it
is a bit rough because there is very little separation between the classes.
Even a small change in an input variable will cause the predicted output to
be a different class. It does not perform as well outside of the original
dataset you use for training because it is a step function.
An alternative to the perceptron is a model called a sigmoid neuron. The
main advantage of using the sigmoid neuron is that it is not binary. Unlike
perceptron, which can classify data into two categories, the sigmoid
function creates probability rather than classification. The image below
shows the curve of a sigmoid neuron
Note the shape of the curve around one, where the data is sorted with the
perceptron; the step makes it difficult to classify data with only marginal
differences. With the sigmoid neuron, the data is predicted by the
probability of falling into a particular class. As you can see, the line curves
are at one, which means that the probability of a data point falling into a
certain class increases after one, but it's just a probability.
Learning reinforcement
We have learned that diversifying our trees can create a more accurate
prediction. But what if, instead of using different versions of the same
model, we just used different models? This is a common trick in machine
learning, also known as ensemble modeling. By combining information from
multiple different types of algorithms, we can improve the accuracy and
predictability of our model.
Ensemble modeling is all about the division and victory mentality. Different
models give us different insights about the data that may not be recognizable
for other models. By combining the information from different models, we
can learn even more about the truth of our data.
Ensemble modeling also helps minimize bias and variance in our
predictions. Individual models may contain prediction errors, but the sum of
all our predictions will be more accurate.
There are a few different methods of using ensemble modeling:
The first is to take the mode of your predictions. That is, take the value most
common with the models. Whichever prediction occurs most often or has the
highest number of "votes" is the prediction we choose.
We can also take the average of all predictions depending on the type of
models we have. The average of all predictions becomes our final prediction.
Our ensemble must also take into account the reliability of individual
models. The results of our models are given different weights, making some
predictions more important than others based on reliability.
How do we know what kind of models we want to combine? We already
know from this book that there are different types of models to choose from,
each with different capabilities and benefits.
A common pair of models use neural networks and decision trees together.
Neural networks provide us with new information and the decision tree
ensures that we have not missed anything.
In addition to the bootstrapping and bagging we discussed earlier, there are a
few other ways to do ensemble modeling. Data scientists use a so-called
bucket of models. Here they use different types of models to use with the
test data and then choose the one that performed best.
Another idea is called stacking. Stacking uses different types of models and
then uses all the results to give us a prediction that is a combination of all.
Data scientists like to use ensemble modeling because we can usually make
better predictions with a variety of models than with a single model alone.
The downside of ensemble modeling is that we lose some of our legibility.
Having multiple models at the same time makes interpretation more
difficult, especially if you want to share the data with stakeholders who have
no knowledge of data science.
We can also use different versions of the same model, such as how random
forests improve prediction with multiple versions of themselves - using
neural networks with different sets of nodes and different values for k, or
numbers of clusters to see how that changes the outcome of our prediction
and find out if there is an optimal value for k, or if there are any groups or
subgroups that we may have overlooked.
It doesn't do much if we already have a strong model. But if we combine a
few models with weaker forecast capabilities, it usually improves overall
accuracy.
Things you need to know for machine
learning
To be successful with machine learning, you need to have the right tools to
work, just as you should have skills and the necessary tools when building a
house. Below is a list of the materials needed to do machine learning.
Data
In order to work with your data, you must have enough data to divide it into
two categories; training data and test data.
Training data is the data that you initially use when building your model.
When you first create your model, you need to give it some data to learn
from. With training data you already know the independent variables and
their respective dependent variables. This means that for every input you
already know the output of your data. Based on this data, your model learns
to predict the output itself. Our training data gives us the parameters we
need to make predictions. This is the data our machine learns from.
Test data is the data the machine receives as soon as you are satisfied with
the model and see what it does in the wild. In this data, we only have the
independent variables, but no output. With test data, we can see how well
our model predicts a result with new data.
Your training data should contain most of your data; about 70%, while your
test data is the remaining 30%. To avoid bias, make sure that the data you
choose for training data and test data is completely random when you split
them. Do not choose which data you want to use; let it be random. Do not
use the same data for training and testing. Start by giving the training data
to the machine and investigate the relationships between X and Y, then try
to see how well your model performed.
The main question to consider during this process is whether your model
will still work when presented with new data. You can test this through
cross validation. This means that you will test your model for data that you
have not yet used. Have some data on hand that you have not used during
training to see how accurate your model is at the end.
You can also use K-fold validation to check the accuracy of your model.
This method is quite easy to use and generally unbiased. It's a good
technique to use if we don't have a lot of data to test with. For K-fold
validation, we will split our data into K-folds, usually between 5 and 10.
Test each fold and see how they performed over all folds when you are done
testing. Usually, the larger your number for k, the less biased your test will
be.
So far we've talked about models that interpret data to find meaning and
patterns. But what kind of data are we going to use? Where do we get our
data from and what does it look like?
Data is the most critical part of machine learning. After all, your model only
learns with data, so it is important that you have relevant and meaningful
data. It came in many shapes and sizes, different structure depending on the
types of data. The more structured the data, the easier it is to work with.
Some data have a very small structure and these data are more difficult to
interpret. Face recognition data can be huge and has little meaning to the
untrained eye.
Structured data is better organized. This is the type of data you are likely to
use when you first start. It will help get your feet wet and you can start
understanding the statistic involved in machine learning. Structure data
usually comes in a familiar form that looks something like this, in rows and
columns. This is called a table dataset.
Market num_bedroomsnum_bathrooms Sq_ft pool (Y/N)
Value
$207,367 4 3 2635 N
$148,224 3 2 1800 Y
$226,897 5 3.5 2844 Y
$122,265 2 1.5 1644 N
The main question to consider during this process is whether your model
will still work when presented with new data. You can test this through
cross validation. This means that you will test your model for data that you
have not yet used. Have some data on hand that you have not used during
training to see how accurate your model is at the end.
You can also use K-fold validation to check the accuracy of your model.
This method is quite easy to use and generally unbiased. It's a good
technique to use if we don't have a lot of data to test with. For K-fold
validation, we will split our data into K-folds, usually between 5 and 10.
Test each fold and see how they performed over all folds when you are done
testing. Usually, the larger your number for k, the less biased your test will
be.
So far we've talked about models that interpret data to find meaning and
patterns. But what kind of data are we going to use? Where do we get our
data from and what does it look like?
Data is the most critical part of machine learning. After all, your model only
learns with data, so it is important that you have relevant and meaningful
data. It came in many shapes and sizes, different structure depending on the
types of data. The more structured the data, the easier it is to work with.
Some data have a very small structure and these data are more difficult to
interpret. Face recognition data can be huge and has little meaning to the
untrained eye.
Structured data is better organized. This is the type of data you are likely to
use when you first start. It will help get your feet wet and you can start
understanding the statistic involved in machine learning. Structure data
usually comes in a familiar form that looks something like this, in rows and
columns. This is called a table dataset.
Prepare the data
So now you have your data, but how do you get it to a point where it is
readable by your model? Data seldom fits directly with our modeling needs.
In order to properly format our data, a round of data cleanup is usually
required first. The data cleaning process is often referred to as data
scrubbing.
We may have data in the form of images or emails. We have to rewrite it so
that it has numerical values that can be interpreted by our algorithms. After
all, our machine learning models are algorithms or mathematical equations,
so the data must have numerical values to be modeled.
You may also have pieces of data that are recorded incorrectly or in the
wrong format. There may be variables you don't need and need to get rid of.
It can be tedious and time consuming, but it is extremely important to have
data that works and can be easily read by your model. It is the least sexy
part of a data scientist.
This is the part of machine learning where you are likely to spend the most
time. As a data scientist, you probably spend about 20% of your time on
data science and the other 80% of your time making sure your data is clean
and ready to be processed by your model. We may combine multiple types
of data and we will need to reformat the recordings to match them. First, in
the case of guided learning, choose the variables that you think are most
important to your model. Choosing irrelevant variables or variables that
don't matter can cause bias and make our model less effective.
A simple example of cleaning or scrubbing data is recoding a response for
gender. On your data you have a column for male / female. Unfortunately,
men and women have no numerical value. But you can easily change this
by making it a binary variable. Assign female = 1 and male = 0. Now you
can find a numerical value for the effect that being a woman has on the
outcome of your model.
We can also combine variables to make it easier to interpret. Let's say you
create a regression model that predicts a person's income based on several
variables. One of the variables is the level of education, which you have
recorded in years. So the possible responses for years of education are 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16. These are many separate
categories. You could simplify it by creating groups. For example, you can
rewrite variables 1, 2, 3, 4, 5, 6, 7, 8 = primary_ed and rewrite 9, 10, 11, 12
= secondary_ed and rewrite 13, 14, 15, 16 = tertiary education. Instead of
twelve categories, you have three. Respondents have either a basic
education, secondary education, or a level of post-secondary or university
education. This is known as binning data and it can be a good way to clean
up your data if used properly.
When you combine variables to make interpretation easier, you have to
balance more streamlined data with losing important information about
relationships in the data. Note that in this example, by combining these
variables into three groups instead of sixteen, you create bias in your model.
There are many factors that can cause you to clean your data. Even a
misspelling or an extra space somewhere in your data can have a negative
effect on your model.
You may have missing data. To fix this situation, you can replace the
missing values with the mode of the median of that variable. It is possible to
delete data with missing values if there are only a few, but this just means
you have to use less data in your model.
Programming tools
To process your data, you need special programming tools so you can tell the
data what you want them to do. We have already mentioned that machine
learning is a branch of computer science. This is where that comes into play.
In the introduction, we said that the three most common languages for data
science are Python, R and C ++. Choosing the right one depends on your
experience and what you plan to do with your data.
The most common language for data science is python. It was created in
1991 by Guido Van Rossum and it is notable for being easier to read than
other programming languages. It is still being developed and improved. It is
not complicated to learn and is compatible
with the most relevant data types. It also has applications beyond data
manipulation that will be useful in machine learning.
Python has several free packages that you can install that were created to
give you shortcuts to commonly used data science tools. These packages
contain shortcuts to codes often used in machine learning, so you have less
work to do.
Pandas is an indispensable library of tools for data scientists working with
python. This makes it easier to manipulate time series data and table data
series. It shows your data in rows and columns so that it is easier to manage,
the same way you would look at data in Microsoft Excel. It is easy to find
online and free to download. Pandas are useful when looking at datasets in
.CSV format.
Numpy is a useful program to process data faster with python. It works in
the same way as Matlab and it can process matrices and multidimensional
data. It will help you import large data sets more easily.
Scikit-learn is another library of the machine learning function. With Scikit
learn, you have easy access to many of the algorithms we mentioned earlier,
which are often used in machine learning. Algorithms like classification,
regression, clustering, support vector, arbitrary forest and k resources have
shortcuts so much of the grunt coding is done for you.
R is the third option. It is free to use and open source. R can be used for both
data mining and machine learning. It is popular for those new to data science
because of its availability. It can't handle the larger datasets required for
more advanced machine learning operations, but it's not a bad place to start
if you're new to data science and computer programming.
You need a computer to run these programs. Usually a regular laptop or
desktop computer is powerful enough to handle smaller and medium data
sets, especially if you are new to machine learning.
While Graphics Processing Units (GPUs) have been around for some time,
their accessibility has increased in recent years, making data science more
accessible. It is a
breakthrough in data science because the field is no longer limited to labs
with huge computers.
GPUs are known as the power behind video games. This allows a computer
to interpret multiple points at once, which is essential for processing large
amounts of data. With GPUs, we can now do much more with much less
computer hardware. The predecessor, CPU cores, control multiple control
units, allowing information to be processed in one go. Rather than having
multiple control units, the GPU has a much larger web of cores that can all
handle different processes at once. One GPU card can contain nearly 5000
processors. It is a major advance for artificial intelligence and machine
learning. They can help speed up the processing of neural networks.
C and C ++ are other commonly used languages for data analysis. The
advantage of C ++ is that it is a very powerful language. It can process huge
data sets very quickly. Data scientists who use massive data sets often
choose to use C ++ because of its speed and processing power, especially
when working with data sets over a terabyte. C ++ can process one gigabyte
of data in about a second. This makes it especially useful for deep learning
algorithms, 5-10 layer neural network models and huge data sets. This type
of model can be overwhelming for software that is not so fast. If you are
doing more advanced machine learning and you have multiple GPUs, then C
++ may be the language for you. C ++ can do almost anything; it is a very
versatile language.
The downside is that the libraries in C ++ are not as extensive as those in
Python. This means that when you write code for your data and model, you
probably start from scratch. No matter what kind of projects you decide to
do, there will be roadblocks as you write your code. Having a library that
can help you when you get stuck helps you learn and work faster.
Develop models