Difference Between Statistical Model and Machine Learning(1)
Difference Between Statistical Model and Machine Learning(1)
Statistical Model:
A mathematical process that attempts to describe the population from which a sample came, which
allows us to make predictions of future samples from that population.
estimating a stock’s future price using previous data, and time series analysis.
used for proving any result such as hypothesis testing, and p-value.
Independence, states that there shouldn’t be any relationships between the observations in
the collection.
Normality requires that the response variable’s distribution is approximately normal, with
data symmetric around the mean.
Linearity indicates that the relationship between the response variable and predictor
variable(s) should be linear.
outliers, the dataset should not contain any outliers that may influence the results.
1. The group of probability distributions that have a finite number of parameters is known
as parametric.
2. Nonparametric models are those where the kind and quantity of parameters are adjustable
and not predetermined.
3. Semiparametric means that the parameter has both a parametric and a non-parametric.
Machine Learning:
Machine Learning is the science that allows computers to learn and improve their learning over time,
by feeding them data and information in the form of observations and real-world interactions.
According to Arthur Samuel machine learning is, “the field of study that gives computers the ability
to learn without being explicitly programmed “ i.
OR
According to Tom Mitchell, “Machine learning is the study of computer algorithms that allow
computer programs to improve through experience automatically”.
Example: Predicting house price with the help of a machine learning model on the basis of attributes
such as location, and area by the help of machine learning we can find out the relationship between
the dependent variable (i.e house price) on independent features (i.e location, area, year of
formation) and we can predict the price of another input on the resulting relation.
Data is independent and identically distributed (IID), which means that every data point is
independent of the others and has the same distribution.
The assumption that there is a linear relationship between the input variables and the
output variable underlies some models, such as linear regression.
Normality, Some models presuppose that the model’s input variables and/or error terms are
distributed normally.
No multicollinearity, Linear models presuppose that the input variables are not highly
associated with one another and do not exhibit multicollinearity.
High Sample Size, Certain models rely on the sample size being sufficiently big to guarantee
precise parameter estimates.
Model Comparison
The Difference between Statistical Models and Machine Learning are as follows:
The relationship between variables is The relationship between variables is finding out by
found in the form of mathematical the self-learning algorithm that learns from the
equations. data without relying on rule-based learning.
It is not best suited to a large amount of It can range from small to large amounts of data
data. sets
Best estimate relationship between Strong predictive ability due to the ability to learn
variables from past data.