Introduction To Data Science
Introduction To Data Science
• Department Of B.B.A.(C.A.)
• Class:-S.Y.B.B.A.(C.A.)
• Subject:-Big Data
• Chapter No:-2
• Chapter Name:-Introduction to Data Science
• Presented By:-Asst.Prof. Shinde Akshay M.
Introduction to Big Data By Asst.Prof. Shinde Akshay M. 18/11/2024
Big Data Analytics
• Big Data analytics is the process of collecting,
organizing and analyzing large sets of data (called Big
Data) to discover patterns and other useful information.
Data
Process Organize Cleaning
Requirement
Types Of Data Analytics
1. Descriptive: What is happening?
1. Descriptive analytics answers the question
of what happened.
2. This type of analysis describes or summarizes
raw data(Past Data) into something explainable
and meaningful.
Sample
• A sample is the specific group that you will collect
data from.
• The size of the sample is always less than the total
size of the population.
Population
It is the collection of a specified group of similar
objects, individuals, or entities that have some
common observable characteristics in them. Out of
which, each object is termed as “Elementary
units”.
Finite population
This is a type of population in which the number of
elementary units is exactly quantifiable.
Example- Books in a university library.
Infinite population
In this type of population, The count of elementary
units is not quantifiable to at most certainty.
Example- Population of a country. The population
of a country is not certainly quantifiable in most of
the time while approximation can be done. This is
because in each second the number of deaths and
births is changing concerning time.
Real population
This is such a type of population that is mostly
based on real-time data and the information is
concrete and reliable. This population does not
require approximation or hypothetical data.
Example- Employees working in a company.
Hypothetical population
This can be a finite or infinite imaginary population
designed by a researcher. Here mostly, the
researcher will take a real-time scenario and apply
his/her common hypothesis or assumptions to
draw the structure and information of a population.
Example- Possible outcomes of a die if rolled ’n’
times.
Sample
Sample size
The number of items in a sample is called a sample
size.
example, Out of 50k employees, 5k was selected
for analysis and that makes the sample size 5k.
Characteristics of the sample
1. Representativeness
These three basic steps are used iteratively until an appropriate model for the
data has been developed
A]Model Selection:-
In the model selection step, plots of the data, process knowledge and
assumptions about the process are used to determine the form of the model to
be fit to the data
B] Model Fitting:-
Model fitting method is used to estimate the unknown parameters in the model
C] Model Validation:-
Model Validation validates the model i.e. model is useful for us or not
Reasons to Learn Statistical
Modeling
A)You will be better equipped to choose
the right model for your needs.
• There are many different types of statistical models, and an effective
data analyst needs to have a comprehensive understanding of them
all.
• In each scenario, you should be able to identify not only which
model will help best answer the question at hand, but also which
model is most appropriate for the data you’re working with
B)You will be better able to prepare
your data for analysis.
• Data is rarely ready for analysis in its raw form. To ensure your
analysis is accurate and viable, the data must first be cleaned up.
This cleanup often includes organizing the gathered information
and removing “bad or incomplete data” from the sample.
• “Before any statistical model can be completed, you need to
explore [and], understand the data. “If there is no quality [in
the data], then you can’t really derive any insights from it.”
• Once you know how various statistical models work and how
they leverage data, it will become easier for you to
determine what data is most relevant to the question you
are trying to answer, as well.
3. You will become a better
communicator.
• In most organizations, data analysts are required
to communicate their findings with two different
audiences.
Univariate Data: