0% found this document useful (0 votes)

26 views

Introduction To Data Science

Uploaded by

sagarmeravi563

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Introduction To Data Science

Uploaded by

sagarmeravi563

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

Vidya Pratishthan’s Arts Science And

Commerce College Vidyanagri Baramati

• Department Of B.B.A.(C.A.)
• Class:-S.Y.B.B.A.(C.A.)
• Subject:-Big Data
• Chapter No:-2
• Chapter Name:-Introduction to Data Science
• Presented By:-Asst.Prof. Shinde Akshay M.
Introduction to Big Data By Asst.Prof. Shinde Akshay M. 18/11/2024
Big Data Analytics
• Big Data analytics is the process of collecting,
organizing and analyzing large sets of data (called Big
Data) to discover patterns and other useful information.

• Data analytics is the science of analyzing raw data in

order to make conclusions about that information.

• Big Data analytics can help organizations to better

understand the information contained within the data
and will also help identify the data that is most
important to the business and future business
decisions.
• Data Analytics involves applying an algorithmic
or mechanical process to derive insights and, for
example, running through several data sets to
look for meaningful correlations between each
other.Introduction to Big Data By Asst.Prof. Shinde Akshay M. 18/11/2024
Steps Involved in Data Analytics
The process involved in data analytics involves several
different steps:
1. The first step is to determine the data requirements or how
the data is grouped. Data may be separated by age,
demographic, income, or gender. Data values may be
numerical or be divided by category.
2. The second step in data analytics is the process of
collecting it. This can be done through a variety of sources
such as computers, online sources, cameras,
environmental sources, or through personnel.
3. Once the data is collected, it must be organized so it can be
analyzed. Organization may take place on a spreadsheet or
other form of software that can take statistical data.
4. The data is then cleaned up before analysis. This means it
is scrubbed and checked to ensure there is no duplication
or error, and that it is not incomplete. This step helps
correct any errors before it goes on to a data analyst to be
analyzed.
Steps Involved in Data Analytics

Data
Process Organize Cleaning
Requirement
Types Of Data Analytics
1. Descriptive: What is happening?
1. Descriptive analytics answers the question
of what happened.
2. This type of analysis describes or summarizes
raw data(Past Data) into something explainable
and meaningful.

3. With the help of descriptive analysis, we

analyze and describe the features of a data.
4. In the descriptive analysis, we deal with
the past data to draw conclusions and present
our data in the form of dashboards.
5. It looks at the past performance and
understands the performance by mining
historical data to understand the cause of
success
6. or businesses,
In failure in the past.
descriptive analysis is used
for determining the Key Performance
Indicator or KPI to evaluate the performance
of the business.
7. Descriptive analytics looks at data and
analyze past event for insight as to how to
approach future events.
8. This is mostly used to summarize different
aspects of a particular business, describe
what’s going on in a particular organization and
when it’s required to understand activities at an
aggregate
9. level.
This relates to describing the past and is
useful as it allows us to analyze past behaviors
and how they could have an impact in the near
future.
10. Almost all management reporting such as
sales, marketing, operations, and finance uses
this type of analysis.
 Common example of Descriptive analytics are
company reports that provide historic review
like:
•Data Queries
•Reports
•Descriptive Statistics
•Data dashboard
2. Diagnostic: Why is it happening?
1. Diagnostic Analytics examines data to answer
the question “Why did it happen?”.
2. This is characterized by various techniques
such as Drill- Down, Data Discovery, Data Mining
and Correlations.
3. These techniques allow the users to go
towards deeper analysis which will result in
justifying why certain activities or situations have
occurred
4. in assessment
On an organization.
of the descriptive data,
diagnostic analytical tools will empower an
analyst to drill down and in so doing isolate the
root-cause
5. of astage,
At this problem.
historical data can be
measured against other data to answer the
question of why something happened.
6. Diagnostic analytics gives in-depth insights
into a particular problem.
3. Predictive: What is likely to happen?
1. Predictive Analysis answers the question “What
is likely to happen?”

2. It uses the findings of descriptive and

diagnostic analytics to detect clusters and
exceptions, and to predict future trends, which
makes it a valuable tool f or forecasting.
3. Predictive models typically utilize a variety
of variable data to make the prediction.
4. Predictive models typically utilize The
variability of the component data will have a
relationship with what it is likely to predict
(e.g. the older a person, the more susceptible
they are to a heart-attack – we would say
that age has a linear correlation with heart-attack
risk). a variety of variable data to make the
5. Predictive models are some of the most
prediction.
important utilised across a number of fields.
Deeper Insights
Gain deep insights regarding your business, processes,
functioning, and customer needs. With us, you can make
everything better.
Unknown Patterns
Uncover the unknown industry patterns, adopt trends quickly,
and grab new opportunities for enhanced, data-driven
working.
Customer Understanding
With data, understand your customer, their needs, and the
methods to fulfil these needs for enhanced customer
engagement.
High Business Performance
Data-driven insights allow organizations to drive high
business performance and know what is right or wrong for
your business.
Strategic Decisions
Data gives you the power to take strategic business
decisions. You can understand the strategic needs of your
organization.
Predictive Behaviour
Predict the behaviour of new processes and procedures even
Prescriptive: What do I need to do?
1. The next step up in terms of value and
complexity is the prescriptive model.

2. Prescriptive analytics is used to literally

prescribe what action to take when a problem
occurs.
3. The prescriptive model utilizes an
understanding of what has happened, why it
has happened and a variety of “what-might-
happen” analysis to help the user determine the
best course of action to take.
4. Prescriptive analysis is typically not just
with one individual action, but is in fact a
host
5. of other
A goodactions.
example of this is a traffic
application helping you choose the best route
home and taking into account the distance of
each route, the speed at which one can
travel on each road and, crucially, the current
traffic constraints.
It uses a vast data sets and intelligence to analyze
the outcome of the possible action and then select
the best option.

Prescriptive analytics uses sophisticated tools and

technologies, like machine learning, business rules
and algorithms, which makes it sophisticated to
implement and manage.
Statistical inference
• Statistical inference is the process of using data
analysis to deduce properties of an
underlying distribution of probability
It is also known as Statistical Induction

• Inferential statistical analysis infers properties of

a population.
• for example by testing hypotheses and deriving
estimates.
• It is assumed that the observed data set
is sampled from a larger population.
• Inferential statistics can be contrasted with descriptive
statistics
• Descriptive statistics is solely concerned with properties
of the observed data, and it does not rest on the
assumption that the data come from a larger
population.
Population

• A population is the entire group that you want to

draw conclusions about.

Sample
• A sample is the specific group that you will collect
data from.
• The size of the sample is always less than the total
size of the population.
Population
It is the collection of a specified group of similar
objects, individuals, or entities that have some
common observable characteristics in them. Out of
which, each object is termed as “Elementary
units”.

Example- Let’s consider we have a list consisting

of the name of all the employees in a company, It
is nothing but a population. Out of which each
employee will be considered as an elementary unit.
Types of Population

Finite population
This is a type of population in which the number of
elementary units is exactly quantifiable.
Example- Books in a university library.

Infinite population
In this type of population, The count of elementary
units is not quantifiable to at most certainty.
Example- Population of a country. The population
of a country is not certainly quantifiable in most of
the time while approximation can be done. This is
because in each second the number of deaths and
births is changing concerning time.
Real population
This is such a type of population that is mostly
based on real-time data and the information is
concrete and reliable. This population does not
require approximation or hypothetical data.
Example- Employees working in a company.

Hypothetical population
This can be a finite or infinite imaginary population
designed by a researcher. Here mostly, the
researcher will take a real-time scenario and apply
his/her common hypothesis or assumptions to
draw the structure and information of a population.
Example- Possible outcomes of a die if rolled ’n’
times.
Sample

A part of the population drawn according to a rule

or plan for concluding characteristics is called a
sample.

Sample size
The number of items in a sample is called a sample
size.
example, Out of 50k employees, 5k was selected
for analysis and that makes the sample size 5k.
Characteristics of the sample

A sample should follow certain characteristics to

make it fit for data analysis.

1. Representativeness

A sample should represent the overall behavior of

a population.
imagine the situation in the above example in
which 5k employees are selected out of 50k
employees.
2. Homogeneity
Homogeneity is nothing but the matching of
behavior in multiple samples.
Imagine if we want to calculate the mean salary of
the 50 k employees and we have 3 samples each
of 5k sample size.
· Sample 1 has a mean salary of $40k
· Sample 2 has a mean salary of 38k
· Sample 3 has a mean salary of $41k
We can say that these samples are homogeneous
since all samples are giving approximately equal
information
3. Adequacy regarding the salary of the employees.

The number of sampling units in a sample should

be adequate for doing the research.
In the above example, Out of 50k employees, It will
be not effective if draw a sample of sample size 5
or 6 for doing research.
4. Similar regulating conditions

There should be a similar way of selecting samples

if there is a need for multiple samples.

In the above example, Out of 50k employees, a

sample of 5k employees was chosen at random
and if we are selecting another sample it’s should
be also chosen randomly.
Some important terminologies
Sampling unit
Similar to the elementary unit, each element in the sample is
called a sampling unit. Here out of 5k employees, each of the
employees will be a sampling unit.
Sampling frame
A complete list of sampling units, maps, or other acceptable
material, which represents the population to be sampled is called
the sampling frame.
Statistical Modelling

• Statistical modeling is the process of applying

statistical analysis to a dataset.

• A statistical model is a mathematical

representation (or mathematical model) of
observed data.
When data analysts apply various statistical
models to the data they are investigating, they are
able to understand and interpret the information
more strategically. Rather than sifting through the
raw data, this practice allows them to identify
relationships between variables, make
predictions about future sets of data, and visualize
that data
 Statistical Modelling:-
In simple terms Statistical modelling is a simplified,
mathematically-formalized way to approximate reality(i.e. What generate your
data) and optionally to make predictions from the observation.

• All commonly used statistical procedures can be put into a general

modelling framework.
This is of the form Data=Pattern + Residual

• Variation in the observed data can be split into two component

1]Pattern:- Systematic or explained variation
2]Residual:-Leftover or unexplained variation
Basic Steps of Statistical Model Building are:-
A] Model Selection
B] Model Fitting
C] Model Validation

These three basic steps are used iteratively until an appropriate model for the
data has been developed

A]Model Selection:-
In the model selection step, plots of the data, process knowledge and
assumptions about the process are used to determine the form of the model to
be fit to the data

B] Model Fitting:-
Model fitting method is used to estimate the unknown parameters in the model

C] Model Validation:-
Model Validation validates the model i.e. model is useful for us or not
Reasons to Learn Statistical
Modeling
A)You will be better equipped to choose
the right model for your needs.
• There are many different types of statistical models, and an effective
data analyst needs to have a comprehensive understanding of them
all.
• In each scenario, you should be able to identify not only which
model will help best answer the question at hand, but also which
model is most appropriate for the data you’re working with
B)You will be better able to prepare
your data for analysis.
• Data is rarely ready for analysis in its raw form. To ensure your
analysis is accurate and viable, the data must first be cleaned up.
This cleanup often includes organizing the gathered information
and removing “bad or incomplete data” from the sample.
• “Before any statistical model can be completed, you need to
explore [and], understand the data. “If there is no quality [in
the data], then you can’t really derive any insights from it.”
• Once you know how various statistical models work and how
they leverage data, it will become easier for you to
determine what data is most relevant to the question you
are trying to answer, as well.
3. You will become a better
communicator.
• In most organizations, data analysts are required
to communicate their findings with two different
audiences.

• The first audience consists of those on the business

team who don’t need to understand the details of
your analysis, but simply want to know the key
takeaways.

• The second audience consists of those who are

interested in the more granular details; this group
will want both the list of broad conclusions and an
explanation of how you reached them.
• Having a thorough understanding of statistical modeling can help you
better communicate with both of these audiences, as you will be better
equipped to reach conclusions and therefore generate better data
visualizations, which are helpful in communicating complex ideas to
non-analysts. Simultaneously, a complex understanding of how these
models work on the backend will allow you to generate and explain
those more granular details when necessary.
Probability:-
Probability theory developed from the study of games of chance like dice and cards

Probability theory has a foundation of Statistical Inference

Probability=The no. of ways of achieving success / The total no of possible

outcomes
Probability
• probability is an intuitive concept. We use it on a
daily basis without necessarily realizing that we
are speaking and applying probability to work.
• Life is full of uncertainties. We don’t know the
outcomes of a particular situation until it
happens. Will it rain today? Will I pass the next
math test? Will my favorite team win the toss?
Will I get a promotion in next 6 months? All
these questions are examples of uncertain
situations we live in.
•Experiment – are the uncertain situations, which
could have multiple outcomes. Whether it rains on
a daily basis is an experiment.
•Outcome is the result of a single trial. So, if it
rains today, the outcome of today’s trial from the
experiment is “It rained”
•Event is one or more outcome from an
experiment. “It rained” is one of the possible event
for this experiment.
•Probability is a measure of how likely an event
is. So, if it is 60% chance that it will rain tomorrow,
the probability of Outcome “it rained” for tomorrow
Why do we need probability?

In an uncertain world, it can be of immense help to know

and understand chances of various events. You can plan
things accordingly. If it’s likely to rain, I would carry my
umbrella. If I am likely to have diabetes on the basis of
my food habits, I would get myself tested. If my customer
is unlikely to pay me a renewal premium without a
reminder, I would remind him about it.
Probability Distribution:

It is simply a statistical function that explains

complete probable values and likelihoods that are
accounted for by a random variable in a given
range.
A probability distribution is a summary of
probabilities for the values of a random variable

As a distribution, the mapping of the values of a

random variable to a probability has a shape when
all values of the random variable are lined up.

The distribution also has general properties that

can be measured.

Two important properties of a probability

distribution are the expected value and the
variance.
Expected Value:-

The expected value is the average or mean value

of a random variable X.

This is the most likely value or the outcome with

the highest probability.

It is typically denoted as a function of the

uppercase letter E with square brackets: for
example, E[X] for the expected value
of X or E[f(x)] where the function f() is used to
sample a value from
The expectation the
value (ordomain of X.of a random
the mean)
variable X is denoted by E(X)

•Expected Value. The average value of a random

variable.
Variance:-

The variance is the spread of the values of a

random variable from the mean.

This is typically denoted as a function Var; for

example, Var(X) is the variance of the random
variable X or Var(f(x)) for the variance of values
drawn from the domain of X using the function f().

•Variance. The average spread of values around

the expected value.
The structure of the probability distribution will
differ depending on whether the random variable
is discrete or continuous.

Discrete Probability Distributions

A discrete probability distribution summarizes the

probabilities for a discrete random variable.

The probability mass function, or PMF, defines the

probability distribution for a discrete random
variable.
Discrete probability functions are also known as
probability mass functions and can assume a
discrete number of values.
A discrete probability distribution has a cumulative
distribution function, or CDF.
This is a function that assigns a probability that a
discrete random variable will have a value of less
than or equal to a specific discrete value.
•Probability Mass Function. Probability for a
value for a discrete random variable.
•Cumulative Distribution Function. Probability
less than or equal to a value for a random variable.

For example, coin tosses and counts of events are

discrete functions.
These are discrete distributions because there are
no in-between values.

For example, the likelihood of rolling a specific

number on a die is 1/6. The total probability for all
six values equals one. When you roll a die, you
inevitably obtain one of the possible values.
Types of Discrete Distribution

There are a variety of discrete probability

distributions that you can use to model different
types of data. The correct discrete distribution
depends on the properties of your data.

•Binomial distribution to model binary data, such

as coin tosses.

•Poisson distribution to model count data, such as

the count of library book checkouts per hour.

•Uniform distribution to model multiple events with

the same probability, such as rolling a die.
Continuous Probability Distributions

Continuous probability functions are also known as

probability density functions(PDF).

Unlike discrete probability distributions where each

particular value has a non-zero likelihood, specific
values in continuous distributions have a zero
probability.
For example, the likelihood of measuring a
temperature that is exactly 32 degrees is zero.
What Is Correlation?

• Correlation is a statistical measure.

• Correlation explains how one or more variables

are related to each other. These variables can be
input data features which have been used to
forecast our target variable.
• Two features (variables) can be positively
correlated with each other. It means that when
the value of one variable increases then the
value of the other variable(s) also increases.
Correlation is really one of the very basics of data
analysis and is an important tool for a data
analyst, as it can help define trends, make
predictions and uncover root causes for certain
phenomena.
There could be essentially two types of data you
can work with when determining correlation:

Univariate Data:

• In a simple set up we work with a single

variable.
We measure central tendency to enquire about the
representative data, dispersion to measure the
deviations around the central tendency, skewness
to measure the shape and size of the distribution
and kurtosis to measure the concentration of the
data at the central position. This data, relating to
a single variable is called univariate data.
Bivariate data:

it often becomes essential in our analysis to study

two variables simultaneously

For example, a> height and weight of a person, b>

age and blood pressure, etc.
This statistical data on two characters of any
individual, measured simultaneously are termed as
bivariate data.
Types of correlation:
1.Positive correlation 5)Perfect
Positive
2.Negative correlation 6)Perfect
Negative
3.Zero correlation
Positive correlation:
4.Spurious
If due to increase of any ofcorrelation
the two data, the other
data also increases, we say that those two data are
positively correlated.

For example, height and weight of a male or

female are positively correlated.
Negative correlation:
If due to increase of any of the two, the other
decreases, we say that those two data are
negatively correlated.
For example, the price and demand of a
commodity are negatively correlated. When the
price increases, the demand generally goes down.
Zero correlation:

If in between the two data, there is no clear-cut

trend. i.e. , the change in one does not guarantee
the co-directional change in the other, the two
data are said to be non-correlated or may be said
to possess, zero correlation.
For example, quality like affection, kindness is in
most cases non-correlated with the academic
achievements, or better to say that intellect of a
person is purely non-correlated with complexion.
Spurious correlation:

• If the correlation is due to the influence of any

other ‘third’ variable, the data is said to be
spuriously correlated.

For example, children with “body control

problems” and clumsiness has been reported as
being associated with adult obesity. One can
probably say that uncontrolled and clumsy kids
participate less in sports and outdoor activities and
that is the ‘third’ variable here. At most times, it is
difficult to figure out the ‘third’ variable and even if
that is achieved, it is even more difficult to gauge
the extent of its influence on the two primary
variables.
Regression

Regression is a statistical technique that is used to

model the relationship of a dependent variable
with respect to one or more independent variables.

Regression is widely used in several statistical

analysis problems and it is also one of the most
important tools in Machine Learning.

Regression is a statistical method used in finance,

investing, and other disciplines that attempts to
determine the strength and character of the
relationship between one dependent variable
(usually denoted by Y) and a series of other
variables (known as independent variables).

Regression helps investment and financial

managers to value assets and understand the
relationships between variables, such
as commodity prices and the stocks of businesses
dealing in those commodities.
The statistical techniques that expresses a functional relationship
between two or more variables in the form of an equation to
estimate the value of a variable based on the given value of another
variable is Regression analysis

The variable whose value is to be estimated is called Dependant

Variable.
The variable whose value is used to estimate this value is called
Independent Variable
Regression Analysis:

Regression analysis is used in stats to find trends

in data.
For example, you might guess that there’s a
connection between how much you eat and how
much you weigh; regression analysis can help you
quantify that.
Regression analysis will provide you with an
equation for a graph so that you can make
predictions
For example,about your been
if you’ve data putting on weight over
the last few years, it can predict how much you’ll
weigh in ten years time if you continue to put on
weight at the same rate.
In statistics, it’s hard to stare at a set of random
numbers in a table and try to make any sense of it.
For example, global warming may be
reducing average snowfall in your town and you
are asked to predict how much snow you think will
fall this year. Looking at the following table you
might guess somewhere around 10-20 inches.
That’s a good guess, but you could make
a better guess, by using regression.
Linear Regression

A linear regression refers to a regression model

that is completely made up of linear variables.

Beginning with the simple case, Single Variable

Linear Regression is a technique used to model the
relationship between a single input independent
variable (feature variable) and an output
dependent variable using a linear model i.e a line.

Multi-Variable Linear Regression where a model is

created for the relationship between multiple
independent input variables (feature variables)
and an output dependent variable. The model
remains linear in that the output is a linear
combination of the input variables.
A few key points about Linear Regression:
•Fast and easy to model and is particularly useful
when the relationship to be modeled is not
extremely complex and if you don’t have a lot of
data.
•Very intuitive to understand and interpret.
•Linear Regression is very sensitive to outliers.
Polynomial Regression

When we want to create a model that is suitable

for handling non-linearly separable data, we will
need to use a polynomial regression.

In this regression technique, the best fit line is not

a straight line.

A few key points about Polynomial Regression:

•Able to model non-linearly separable data; linear
regression can’t do this. It is much more flexible in
general and can model some fairly complex
relationships.
•Full control over the modelling of feature variables
(which exponent to set).
•Requires careful design. Need some knowledge of
the data in order to select the best exponents.
•Prone to over fitting if exponents are poorly
selected.
Ridge Regression

A standard linear or polynomial regression will fail

in the case where there is high collinearity among
the feature variables.
Collinearity is the existence of near-linear
relationships among the independent variables.
The presence of hight collinearity can be
determined in a few different ways:
•A regression coefficient is not significant even
though, theoretically, that variable should be
highly correlated with Y.
•When you add or delete an X feature variable, the
regression coefficients change dramatically.
•Your X feature variables have high pairwise
correlations (check the correlation matrix).

The Basics of Data Analytics
86% (7)
The Basics of Data Analytics
17 pages
Mba III Unit I Notes
No ratings yet
Mba III Unit I Notes
9 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Arema Roadmap To LRFD
100% (1)
Arema Roadmap To LRFD
28 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Cisco Data Analysis Course Week
No ratings yet
Cisco Data Analysis Course Week
5 pages
Module 4
No ratings yet
Module 4
8 pages
Business analytics UTAK
No ratings yet
Business analytics UTAK
6 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Unit 4_BI
No ratings yet
Unit 4_BI
30 pages
Unit 4_BI
No ratings yet
Unit 4_BI
27 pages
Types of Data Analytics
No ratings yet
Types of Data Analytics
3 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
Unit 1
No ratings yet
Unit 1
27 pages
Fdsa UNIT 3
No ratings yet
Fdsa UNIT 3
16 pages
Unit 4_BI_07c7784c0e21bfce61f93aed1836b8fe
No ratings yet
Unit 4_BI_07c7784c0e21bfce61f93aed1836b8fe
58 pages
2.Data analysis Vs analytics
No ratings yet
2.Data analysis Vs analytics
6 pages
Bda CH1
No ratings yet
Bda CH1
18 pages
Data Analytics
No ratings yet
Data Analytics
17 pages
Q
No ratings yet
Q
28 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
PBS_MODULE-3
No ratings yet
PBS_MODULE-3
70 pages
Individual Assigngment
No ratings yet
Individual Assigngment
13 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Business Analytics Prelim
No ratings yet
Business Analytics Prelim
12 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
test (1)
No ratings yet
test (1)
7 pages
Business Statistics
No ratings yet
Business Statistics
10 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
Unit 4
No ratings yet
Unit 4
5 pages
Data Analytics and Its Types
No ratings yet
Data Analytics and Its Types
2 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
6 pages
unit 2
No ratings yet
unit 2
81 pages
Notes
No ratings yet
Notes
5 pages
1Overview-on-Data-Analysis
No ratings yet
1Overview-on-Data-Analysis
67 pages
Data Analysis UNIT-III
No ratings yet
Data Analysis UNIT-III
11 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Important Question of Introduction of Data Science
No ratings yet
Important Question of Introduction of Data Science
10 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
47 pages
Topic:use Statistical Data Analysis To Drive Fact - Based Decisions
0% (1)
Topic:use Statistical Data Analysis To Drive Fact - Based Decisions
11 pages
Shruti Internship Report
No ratings yet
Shruti Internship Report
14 pages
Module 2 Data Analytics and Its Type
No ratings yet
Module 2 Data Analytics and Its Type
9 pages
Demystifying Data Analysis Methods
No ratings yet
Demystifying Data Analysis Methods
4 pages
Fundamentals of Business Analytics Reviewer
No ratings yet
Fundamentals of Business Analytics Reviewer
7 pages
Data ANALYSIS and Data Interpretation
No ratings yet
Data ANALYSIS and Data Interpretation
15 pages
2 Da
100% (1)
2 Da
17 pages
ANALYTICS - by - Debdeep Ghosh
No ratings yet
ANALYTICS - by - Debdeep Ghosh
5 pages
Types of Big Data Analytics
No ratings yet
Types of Big Data Analytics
7 pages
Unit2notes (1)
No ratings yet
Unit2notes (1)
8 pages
Data Analytics
No ratings yet
Data Analytics
7 pages
Case Study (16b) Group
No ratings yet
Case Study (16b) Group
18 pages
Data Analytics Fundementals
No ratings yet
Data Analytics Fundementals
40 pages
M3 - Business Data Analysis
No ratings yet
M3 - Business Data Analysis
31 pages
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
No ratings yet
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
39 pages
bus. research
No ratings yet
bus. research
7 pages
7 Types of Statistical Analysis
100% (1)
7 Types of Statistical Analysis
9 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Melaka GajahBerang MathsT P3 2014 QA
No ratings yet
Melaka GajahBerang MathsT P3 2014 QA
6 pages
To Quantitative Analysis: To Accompany by Render, Stair, Hanna and Hale Power Point Slides Created by Jeff Heyl
No ratings yet
To Quantitative Analysis: To Accompany by Render, Stair, Hanna and Hale Power Point Slides Created by Jeff Heyl
94 pages
Module II Probability Distribution
No ratings yet
Module II Probability Distribution
76 pages
Chapter 3: Review On Statisti Cs and Databases: Descriptive Statistics
No ratings yet
Chapter 3: Review On Statisti Cs and Databases: Descriptive Statistics
17 pages
Energies 16 04579
No ratings yet
Energies 16 04579
24 pages
RV Prob Distributions First Lecture Notes
No ratings yet
RV Prob Distributions First Lecture Notes
45 pages
Jalayer Et Al-2009-Earthquake Engineering & Structural Dynamics-2
No ratings yet
Jalayer Et Al-2009-Earthquake Engineering & Structural Dynamics-2
22 pages
Risk Management: University of Leeds
No ratings yet
Risk Management: University of Leeds
47 pages
Solid Angle Dependency
No ratings yet
Solid Angle Dependency
6 pages
23-04-2021-1619173506-8-Ijamss-6. Ijamss - Discrete Alpha Power Inverse Lomax Distribution With Application of Covid-19 Data
No ratings yet
23-04-2021-1619173506-8-Ijamss-6. Ijamss - Discrete Alpha Power Inverse Lomax Distribution With Application of Covid-19 Data
12 pages
ADC Chapter 1 Notes
No ratings yet
ADC Chapter 1 Notes
24 pages
Peak Shaving Control Method For Energy Storage
No ratings yet
Peak Shaving Control Method For Energy Storage
6 pages
DAL Assignment 6
No ratings yet
DAL Assignment 6
7 pages
The Level of Sustainability and Mutual Fund Performance in Europe: An Empirical Analysis Using ESG Ratings
No ratings yet
The Level of Sustainability and Mutual Fund Performance in Europe: An Empirical Analysis Using ESG Ratings
10 pages
Applied Probability Theory - J. Chen
100% (3)
Applied Probability Theory - J. Chen
177 pages
Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability
No ratings yet
Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability
117 pages
Problem On Monte Carlo Simulation
No ratings yet
Problem On Monte Carlo Simulation
3 pages
Homework 2
0% (1)
Homework 2
6 pages
Probability and Stochastic Processes: Reza Pulungan
No ratings yet
Probability and Stochastic Processes: Reza Pulungan
29 pages
AI Expert Roadmap
No ratings yet
AI Expert Roadmap
15 pages
Analysis of Financial Time Series 1st Edition Ruey S. Tsay 2024 Scribd Download
100% (12)
Analysis of Financial Time Series 1st Edition Ruey S. Tsay 2024 Scribd Download
70 pages
Materi Uts (Baru)
No ratings yet
Materi Uts (Baru)
92 pages
PQT 18MAB204T Assignment PDF
No ratings yet
PQT 18MAB204T Assignment PDF
3 pages
MI2023.Chapter 2. Random Variables and Probability Distributions
No ratings yet
MI2023.Chapter 2. Random Variables and Probability Distributions
10 pages
Chapter 6 - Random Variables and Probability Distributions
No ratings yet
Chapter 6 - Random Variables and Probability Distributions
101 pages
Microproject PPT (MATHS)
No ratings yet
Microproject PPT (MATHS)
26 pages
Risk Modelling in General Insurance From Principles to Practice 1st Edition Roger J. Gray - Instantly access the full ebook content in just a few seconds
100% (1)
Risk Modelling in General Insurance From Principles to Practice 1st Edition Roger J. Gray - Instantly access the full ebook content in just a few seconds
53 pages
Lecture 04: Statistical Models: Dr. Nguyen Tai Hung
No ratings yet
Lecture 04: Statistical Models: Dr. Nguyen Tai Hung
54 pages
Development of Design Wind Speed Maps For The Caribbean For Application With The Wind Load Provisions of ASCE 7-16 and Later
No ratings yet
Development of Design Wind Speed Maps For The Caribbean For Application With The Wind Load Provisions of ASCE 7-16 and Later
92 pages

Introduction To Data Science

Uploaded by

Introduction To Data Science

Uploaded by

Vidya Pratishthan’s Arts Science And

Commerce College Vidyanagri Baramati

• Data analytics is the science of analyzing raw data in

• Big Data analytics can help organizations to better

3. With the help of descriptive analysis, we

2. It uses the findings of descriptive and

2. Prescriptive analytics is used to literally

Prescriptive analytics uses sophisticated tools and

• Inferential statistical analysis infers properties of

• A population is the entire group that you want to

Example- Let’s consider we have a list consisting

A part of the population drawn according to a rule

A sample should follow certain characteristics to

A sample should represent the overall behavior of

The number of sampling units in a sample should

There should be a similar way of selecting samples

In the above example, Out of 50k employees, a

• Statistical modeling is the process of applying

• A statistical model is a mathematical

• All commonly used statistical procedures can be put into a general

• Variation in the observed data can be split into two component

• The first audience consists of those on the business

• The second audience consists of those who are

Probability theory has a foundation of Statistical Inference

Probability=The no. of ways of achieving success / The total no of possible

In an uncertain world, it can be of immense help to know

It is simply a statistical function that explains

As a distribution, the mapping of the values of a

The distribution also has general properties that

Two important properties of a probability

The expected value is the average or mean value

This is the most likely value or the outcome with

It is typically denoted as a function of the

•Expected Value. The average value of a random

The variance is the spread of the values of a

This is typically denoted as a function Var; for

•Variance. The average spread of values around

Discrete Probability Distributions

A discrete probability distribution summarizes the

The probability mass function, or PMF, defines the

For example, coin tosses and counts of events are

For example, the likelihood of rolling a specific

There are a variety of discrete probability

•Binomial distribution to model binary data, such

•Poisson distribution to model count data, such as

•Uniform distribution to model multiple events with

Continuous probability functions are also known as

Unlike discrete probability distributions where each

• Correlation is a statistical measure.

• Correlation explains how one or more variables

• In a simple set up we work with a single

it often becomes essential in our analysis to study

For example, a> height and weight of a person, b>

For example, height and weight of a male or

If in between the two data, there is no clear-cut

• If the correlation is due to the influence of any

For example, children with “body control

Regression is a statistical technique that is used to

Regression is widely used in several statistical

Regression is a statistical method used in finance,

Regression helps investment and financial

The variable whose value is to be estimated is called Dependant

Regression analysis is used in stats to find trends

A linear regression refers to a regression model

Beginning with the simple case, Single Variable

Multi-Variable Linear Regression where a model is

When we want to create a model that is suitable

In this regression technique, the best fit line is not

A few key points about Polynomial Regression:

A standard linear or polynomial regression will fail

You might also like