0% found this document useful (0 votes)

52 views

Data Analysis - INCOMPLETE - 2

The document provides an introduction to data analysis techniques for hypothesis testing and bivariate analysis. It discusses hypothesis testing concepts like the null and alternative hypotheses, p-values, and confidence intervals. It also covers bivariate analysis methods like correlation tests, chi-square tests, t-tests, ANOVA, and examining the relationship between two variables. Grouping techniques like association rule mining and market basket analysis are also introduced.

Uploaded by

vivek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Data Analysis - INCOMPLETE - 2

Uploaded by

vivek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

I N T RO D U C T I O N TO

D ATA A N A LY S I S

2nd Sem, MCA

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

CONTENT
❑ Introduction to Data Analysis
• Hypothesis Testing
✓ Bivariate Analysis: Correlation Test
o Correlation coefficient
o Chi square test
o T - test
o ANOVA
o Summary tables, contingency tables, visualization
✓ Multivariate Analysis
• Grouping
o Association rule mining
o Market Basket Analysis
o Recommendation system
o Apriori algorithm
o FP Growth Algorithm

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

• Main purpose of statistics is to test a hypothesis.

• Hypothesis: Educated guess about something (should be testable).
o Proposed explanation made on basis of limited evidence as a starting point for further investigation.
• Hypothesis testing in statistics is a way to test results to see if results are meaningful.
• Null hypothesis are generally accepted as being true (initially).
• Alternative hypothesis is effectively the opposite (not always) of a null hypothesis.
o H0: There is no relationship between X and Y variable.
o H1: There is a relationship between X and Y variable.
• Steps in hypothesis Testing:
o State null hypothesis,
o Choose what kind of test to perform,
o Either support or reject null hypothesis.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

• H₀: null hypothesis; No variation between two variables(population); two variables have same distribution.
• Hₐ: two populations (variables) are not equal.
• p-value: if p-value is less than a specified significance level α (alpha value; usually 0.05); difference is
significant and null hypothesis H₀ is rejected.
o P-value (probability value) tells how likely a particular set of observations occurs if null hypothesis were true.
o Smaller the p-value, more likely to reject null hypothesis.
o P-value will never reach zero, because there’s always a possibility.
• H₀ is rejected: two variables are not from same distribution.

Example: significance level 0.05; degrees for freedom = 2; test result = 0.7533
• 95 times out of 100, survey that agrees with a sample will have a distribution
value of 5.99 or less.
• 0.7533 is less than 5.99 → accept null hypothesis with 0.05 significance level

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

• Normal/Gaussian/bell-shaped distribution: continuous probability distribution i.e. symmetrical around its mean.
• Most observations cluster around central peak
• Probabilities for values further away from mean taper off (equally) in both directions.
• Extreme values in both tails of distribution are similarly unlikely.

• Box–Cox transformation

Mean +/- standard Percentage of

deviations data contained
1 68%
2 95%
3 99.7%

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

• In statistics, confidence interval refers to probability that a population parameter will fall between a set of
values for a certain proportion (percentage) of times.
• Confidence intervals measure the degree of uncertainty or certainty.
• Most common are 95% or 99% confidence/significance level.
Confidence level = 100 × (1 − 𝛼)
• For 90% confidence level alpha is 0.1; for 95% confidence level alpha is 0.05; for 99% confidence level 𝛼 is 0.01.
• Confidence level means that; if experiment is repeated over and over again, 95% times results will match.

• Example, a survey conducted on group of pet owners to see how many cans of dog food they purchase a year.
Testing the statistic at 99% confidence level gives a confidence interval of (200,300) → they buy between 200
and 300 cans a year (with a very high probability 99%)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

• Confidence Interval (CI) is a range of values we are fairly sure our true value lies in.

• CI can be constructed with

o t-distribution
μ ± t * σ / (√n)

o Normal or z-distribution
μ ± z * σ / (√n)

• standard error of the sampling distribution = σ / (√n)

• Since size ‘n’ is in denominator and standard deviation ‘s’ is in numerator

→ small samples with large variations increase standard error,
this reduces confidence that sample statistic is a close approximation of the population parameter.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

Confidence
Z-score
Interval

80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
99.5% 2.807
99.9% 3.291

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

Example: Construct a 98% Confidence Interval based on the following data: 45, 55, 67, 45, 68, 79, 98, 87, 84, 82.

• Step 1: Find mean, μ and standard deviation, σ for the data.

σ: 18.172; μ: 71
• Step 2: Subtract 1 from sample size to find degrees of freedom (df).
df = 10 – 1 = 9
• Step 3: Find alpha level; Subtract confidence level from 1, then divide by two. (1 – .98) / 2 = .01
• Step 4: Look up df and α in t-distribution table. For df = 9 and α = .01, table gives 2.821
• Step 5: Apply CI formula for t-distribution μ ± t * σ / (√n)
Lower end of CI range, 71 – 16.22075 = 54.77925
Upper end of CI range 71 + 16.22075 = 87.22075
98% CI is (54.78, 87.22)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – H Y P OT H E S I S T E S T I N G

Example: Construct a 95 % confidence interval an experiment that found the sample mean temperature for a certain
city in August was 101.82, with a population standard deviation of 1.2. There were 6 samples in this experiment

• Step 1: Subtract confidence level (Given as 95 percent in question) from 1 and then divide the result by two.
alpha level (represents area in one tail) = (1 – .95) / 2 = .025
• Step 2: Find z-score from z-table : z score = 1.96.
• Step 3: Plug the numbers into the second part of the formula and solve: z * σ / (√n)
= 1.96 * 1.2/√(6) = 1.96 * 0.49 = 0.96
• Step 4: Find the CI:
Lower end of CI range, subtract step 3 from mean = 101.82 – 0.96 = 100.86
Upper end of CI range, add step 3 to mean = 101.82 + 0.96 = 102.78.
CI is (100.86,102.78)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
• Bivariate Analysis: Analysis of any concurrent relation between two variables or attributes.
o Consists of a group of statistical techniques that examine relationship between two variables.
o Bivariate analysis forms foundation of multivariate analysis.
• Correlation: Relation between two variables.
• Bivariate correlation Test: Statistical technique to determine existence of relationships/association
between two different variables (X, Y)
o whether/how much X will change when there is a change in Y.
Types of tests:
o Correlation: check the association between variables.
o Comparison of means: check the differences between means of variables.
o Regression: check if one variable predicts changes in another variable.
o Non-Parametric: tests that are used when data does not meet the assumptions of parametric tests.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
Parametric Tests
• Prior knowledge of population distribution (normal) is available.
• Fixed set of parameters used to determine probabilistic model.
• Parameters used in normal distribution: Mean, Standard Deviation
• T-test; Z-test; F-test; ANOVA (post-hoc test)

Non-parametric Tests
• No fixed set of parameters available, and also there is no distribution (normal) knowledge available for use.
• No assumption made about parameters for given population.
• Referred to as distribution-free tests.
• More popular; Easy to apply and understand; less complex.
• Chi-square test; Mann-Whitney U-test; Kruskal-Wallis H-test

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N

Correlation Test Selection

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
• Positive correlation: both variables move in same direction → increase in one variable leads to increase in other
variable and vice versa.
o spending more time on a treadmill burns more calories.
• Negative correlation: two variables move in opposite directions → increase in one variable leads to decrease in
other variable and vice versa.
o increasing speed of a vehicle decreases time to reach destination.
• Weak/Zero correlation: one variable does not affect other.
o no correlation between number of years of school a person has attended and letters in his/her name.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
• Correlation coefficient (r) measures strength of association/co-occurrence (between -1 to +1).
• Pearson (‘r’ or product-moment) correlation coefficient: Between two continuous-level variables.
o Positive correlation shows direct relationship between two variables (the larger A, the larger B).
o Negative correlation shows inverse relationship (the larger A, the smaller B).
o Zero correlation coefficient indicates no relationship between the variables at all.
o .1 < | r | < .3 … small / weak correlation
o .3 < | r | < .5 … medium / moderate correlation
o .5 < | r | ……… large / strong correlation

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
Advantages of correlation analysis

• Observe relationships: correlation helps to identify absence/presence of relationship between two

variables.
• Good starting point for research/analysis.
• Uses for further studies: Guides to identify direction and strength of relationship between two
variables and later narrow the findings down in later studies.
• Simple metrics: findings are simple to classify (range from -1.00 to 1.00). Only three potential broad
outcomes of the analysis.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
Bessel's correction

• Use of ‘n − 1’ instead of ‘n’ in the formula for sample variance and sample standard deviation.
• Corrects the bias in estimation of population variance and population standard deviation.
• Except for rare cases (sample mean = population mean), data will be closer to sample mean than it will be
to the true population mean.
• So the value on denominator will probably be a bit smaller than what it would be if used the true
population mean. To make up for this, divide by ‘n-1’ (a smaller value) rather than ‘n’.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C O R R E L AT I O N

Pearson r correlation:

r = Pearson r correlation coefficient between x and y

n = number of observations
xi = value of x (for ith observation)
yi = value of y (for ith observation)
Sx, Sy = S.D. for x and y

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N

rxy = Pearson r correlation coefficient between x and y

Pearson r correlation: n = number of observations
xi = value of x (for ith observation)
yi = value of y (for ith observation)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C O R R E L AT I O N
CATEGORICAL DATA ENCODING

• Categorical data: variables contain label values rather than numeric values.
• Number of possible values is often limited to a fixed set.
• Each value represents a different category.
• Categorical variables also called nominal (ordinal, if ordered).
o variable “pet” with values: “dog” and “cat”.
o Variable “color” with values: “red“, “green” and “blue”.
o variable “rank” with values: “first”, “second” and “third” (ordinal)
• Machine learning algorithms (data analytics) cannot operate on label data directly (all input/output variables
must be numeric).

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C O R R E L AT I O N
CATEGORICAL DATA ENCODING

Integer/Label Encoding
• Each unique category value is assigned an integer value.
• “red” is 1, “green” is 2, and “blue” is 3.
• Easily reversible.
• Such integer values have a natural ordered relationship between each other → machine learning algorithms
tend to understand and harness this relationship.
o For some variables/analysis (ordinal), this may be enough/good.
• For nominal data label encoding is not enough/good.
o using such encoding and allowing the model to assume natural ordering between categories may result in
poor performance or unexpected results.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C O R R E L AT I O N
CATEGORICAL DATA ENCODING

One-Hot Encoding
• New binary variable is added for each unique categorical data value.
• Original variable is discarded.
o In “color” variable example, there are 3 categories.
o 3 binary variables are added.
o “1” value is placed in the binary variable for respective color and “0” values for all other color variables.
Color Red Green Blue
Red 1 0 0
Green 0 1 0
Blue 0 0 1

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
Summary Table
• Visualization that summarizes statistical information about data in table form.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C O R R E L AT I O N
Contingency table:
• crosstabs or two-way tables
• Tabular representation of categorical data.
• Used in statistics to summarize relationship between several categorical variables.
• Special type of frequency distribution table, where two variables are shown simultaneously.
• Usually shows frequencies for particular combinations of values of two discrete random variable s X and Y.
• Each cell in the table represents a mutually exclusive combination of X-Y values.

Gender Result
Male Pass
Female Pass
Pass Fail
Male Fail
Male 2 2 4
Male Fail
Female 2 1 3
Male Pass
4 3
Female Pass
Female Fail

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C H I S Q U A R E
• Pearson’s chi-square test.
• Primary use of chi-square test is to examine whether two variables are independent (not related) or not.
o If two variables are correlated, their values tend to move together, either in same or opposite direction.
o One variable is "not correlated with" or "independent of" other if increase in one variable is not
associated with increase in another.
• Chi-Square statistic is based on the difference between what is actually observed data and what would be
expected if there was truly no relationship between the variables.
• Null and alternative Hypothesis:
o H0: There is no relationship between X and Y variable.
o H1: There is a relationship between X and Y variable.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C H I S Q U A R E
• Calculation of Chi-Square statistic: χ2 = ∑(Oi – Ei)2/Ei

o Oi = observed frequency (observed counts in the cells)

o Ei = expected frequency (if NO relationship existed between the variables) Ei = row total*column total/sample size

• Chi-square statistic can't be negative (WHTHER related or not; doesn’t indicate directionality)
• degrees of freedom = (r-1)*(c-1). (Number of response categories)

o r, c: number of rows, columns in considered dataset (contingency table)

• Compare statistical value for degree of freedom (d) & critical/alpha value (p) from Chi-square distribution
table with calculated Chi-square statistical value to decide whether variables are related or not.
o Accept/reject hypothesis
o Chi-square calculated value > Chi-square critical value → reject the null hypothesis.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – C H I S Q U A R E

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C H I S Q U A R E
• Is gender independent of education level? A random sample of 395 people were surveyed and each person
was asked to report the highest education level they obtained. The data that resulted from the survey is
summarized in the following table:

• Question: Are gender and education level dependent at 95% level of significance?
• In other words, given the data collected above, is there a relationship between the gender of an individual
and the level of education that they have obtained?

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S - C H I S Q U A R E
Actual Data Expected Data

o H0: There is no relationship between X and Y variable.

o H1: There is a relationship between X and Y variable.
• Critical value of χ2 with 3 degree of freedom is 7.815.
• 8.006 > 7.815 → reject the null hypothesis.
• Education level depends on gender at a 95% level of significance.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
One-Sample T – test
• Compares the mean of sample data to a known value.
o Example, one might want to know how sample mean compares to population mean.
• One sample t-test used when population standard deviation not known or sample size is small.
• H0: μ = x̄ (there is no difference in sample and population mean)
• H1: μ > x̄ (there is a difference in sample and population mean)

o x̄ : sample mean
o μ : population mean
o S : sample standard deviation
o N : Number of observations

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
Example: your company wants to improve sales. Past sales data indicate that the average sale was $100 per
transaction. After training your sales force, recent sales data (taken from a sample of 25 salesmen) indicates an
average sale of $130, with a standard deviation of $15. Did the training work? Test your hypothesis at a
95% confidence lelve.
sample mean(x̄) $130.
population mean(μ) $100 (from past data).
H0: μ = x̄
sample standard deviation(s) = $15.
H1: μ > x̄
Number of observations(n) = 25.
calculated t-value = (130 – 100) / ((15 / √(25)) = t = (30 / 3) = 10
degrees of freedom: 25 – 1 = 24.
Alpha = 0.05
Critical t-value = 1.711

calculated t-value > Critical t-value → reject null hypothesis (it’s highly likely that sample mean of sale is
greater → sales training was probably a success)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
Example: A company wants to test the claim that their batteries last more than 40 hours. Using a simple
random sample of 15 batteries yielded a mean of 44.9 hours, with a standard deviation of 8.9 hours. Test this
claim using a significance level of 0.05..

H0: μ = 40
H1: μ > 40

calculated t-value > Critical t-value → reject null hypothesis

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
Two-Sample T – test

• Compares the means of two sample data (means).

o Test the difference (d0) between two sample means.
o To determine whether the means are equal.
o Example; Compare the mean scores of two section (sample) of class (population).

• H0: μ1 = μ2 (there is no difference in sample means)

• H1: μ1 ≠ μ2 (there is a difference in sample means)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

DATA ANALYSIS – T-TEST
Two-Sample T – test
Assuming unequal variances in two sample;

o ̄x1 ̄x2 : sample means

o S1 s2 : sample variances
o n1 n2 : number of observations in the two sample

Assuming equal variances in two sample;

o x̄ 1 ̄x2 : sample means

o Sp : pooled sample standard deviation
o n1 n2 : number of observations in the two sample
o S1 s2 : sample variances

• degrees of freedom; df = n1 + n2 − 2

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

DATA ANALYSIS – T-TEST
Example: Average body fat percentages measures a person’s fitness and it vary by age. Some study tells, normal
range for men is 15-20% body fat, and the normal range for women is 20-25%. Sample data collected from a
group of men and women.
There are some overlap in data and also some differences. Just by looking at the data, it's hard to draw any solid
conclusions about whether the underlying populations of men and women have the same mean body fat.
Check it statistically.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T

• Two histograms are on same scale.

• There are no very unusual points (outliers).
• data look roughly bell-shaped (normal
distribution seems reasonable).
• Examining summary statistics, standard
deviations looks similar → supports the idea
of equal variances.
o THIS can also be checked using test for
variances.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
Assuming equal variances in two sample

H0: μ1 = μ2 H1: μ1 ≠ μ2

degrees of freedom: df=n1+n2−2=10+13−2=21 Alpha = 0.05

Critical t-value = 2.080

calculated t-value > Critical t-value → reject null hypothesis → mean body fat for men and women are NOT equal.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

D ATA A N A LY S I S – T- T E S T
Example: One data set contains miles per gallon for U.S. cars (sample 1) and for Japanese cars (sample 2); the
summary statistics for each sample are shown below.
Apply t-test to conclude whether fuel consumptions in both countries are identical (at alpha = 0.05).

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

DATA ANALYSIS – T-TEST
Hypothesis to test that the means are equal for two samples.
We assume that the variances for the two samples are equal.

H0: μ1 = μ2
Ha: μ1 ≠ μ2

Test statistic: T = -12.62059

Pooled standard deviation: sp = 6.34260
Degrees of freedom: ν = 326
Significance level: α = 0.05
Critical value = 1.9673
Critical region: Reject H0 if |T| > 1.9673

absolute value of test statistic (12.62059) > critical value (1.9673) → reject null hypothesis
conclude that two sample means are different at 0.05 significance level.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

Statistics For Dummies
From Everand
Statistics For Dummies
Deborah J. Rumsey
4/5 (27)
4G Balancing Parameter - ZTE
No ratings yet
4G Balancing Parameter - ZTE
8 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
Data Analysis - Groups - INCOMPLETE
No ratings yet
Data Analysis - Groups - INCOMPLETE
24 pages
Ch-5
No ratings yet
Ch-5
26 pages
Data Analysis
No ratings yet
Data Analysis
61 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Inferential Statistics For Data Science
100% (1)
Inferential Statistics For Data Science
10 pages
Data Analysis: Florenda F. Cabatit RN MA Facilitator
No ratings yet
Data Analysis: Florenda F. Cabatit RN MA Facilitator
44 pages
BRM Unit 3 & 5 Data Analysis
No ratings yet
BRM Unit 3 & 5 Data Analysis
50 pages
Analysingn quantitative data Mbizi modif
No ratings yet
Analysingn quantitative data Mbizi modif
38 pages
Psychology Statistic Note
No ratings yet
Psychology Statistic Note
13 pages
9 Data Analysis
No ratings yet
9 Data Analysis
43 pages
Unit-IV of Data Science
No ratings yet
Unit-IV of Data Science
38 pages
Analyze
No ratings yet
Analyze
194 pages
06 Analyze
No ratings yet
06 Analyze
25 pages
Hoda MR S23 Hypothesis Testing Updated
No ratings yet
Hoda MR S23 Hypothesis Testing Updated
92 pages
Seminar 3
No ratings yet
Seminar 3
69 pages
1 Descriptive Statistics
No ratings yet
1 Descriptive Statistics
20 pages
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
No ratings yet
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
46 pages
304BA AdvancedStatisticalMethodsUsingR
No ratings yet
304BA AdvancedStatisticalMethodsUsingR
31 pages
2 A Statistics
No ratings yet
2 A Statistics
34 pages
lecture-1
No ratings yet
lecture-1
72 pages
Analysing quantitative data -DPPM-2020 (1)
No ratings yet
Analysing quantitative data -DPPM-2020 (1)
34 pages
W7 Dmitriy-Zinovev Descriptive Stats
0% (1)
W7 Dmitriy-Zinovev Descriptive Stats
19 pages
DAF1212 Business Statistics II
No ratings yet
DAF1212 Business Statistics II
67 pages
Data Analytics-11
No ratings yet
Data Analytics-11
23 pages
Statistical Tools - Summary
No ratings yet
Statistical Tools - Summary
4 pages
Sana H - Data Analysis in Psychological Research
No ratings yet
Sana H - Data Analysis in Psychological Research
5 pages
2.1 Descriptive Statistics Contd
No ratings yet
2.1 Descriptive Statistics Contd
20 pages
Data Management Tutorials
No ratings yet
Data Management Tutorials
56 pages
Topics For Final Exam: STA 2023-75475 Fall 2015
No ratings yet
Topics For Final Exam: STA 2023-75475 Fall 2015
3 pages
Assignment No 2 Course Codes 8614
No ratings yet
Assignment No 2 Course Codes 8614
15 pages
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
No ratings yet
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
53 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
RESEARCH Methodology: Associate Professor in Management Pondicherry University Karaikal Campus Karaikal - 609 605
No ratings yet
RESEARCH Methodology: Associate Professor in Management Pondicherry University Karaikal Campus Karaikal - 609 605
46 pages
Stat
67% (3)
Stat
70 pages
Chapter 7
No ratings yet
Chapter 7
39 pages
MBR Lab Week 10-12-1
No ratings yet
MBR Lab Week 10-12-1
65 pages
EDU 411 Topic 5 Data Analysis
No ratings yet
EDU 411 Topic 5 Data Analysis
9 pages
የመወያያ ረዕስ አመራረጥ
No ratings yet
የመወያያ ረዕስ አመራረጥ
39 pages
Advance Data Analysis - Students
No ratings yet
Advance Data Analysis - Students
82 pages
Stat 1116-BHS20100 - M Assignment
No ratings yet
Stat 1116-BHS20100 - M Assignment
7 pages
3-4-RESEARCH-8-2
No ratings yet
3-4-RESEARCH-8-2
54 pages
BTY587 - Unit II
No ratings yet
BTY587 - Unit II
16 pages
Statistics
No ratings yet
Statistics
13 pages
Reseach 04
No ratings yet
Reseach 04
13 pages
exam prep stat 1 and 2
No ratings yet
exam prep stat 1 and 2
5 pages
CG8_DATA-ANALYSIS
No ratings yet
CG8_DATA-ANALYSIS
63 pages
Analysingn Quantitative Data Mbizi
No ratings yet
Analysingn Quantitative Data Mbizi
41 pages
Chap-11 Data Analysis
No ratings yet
Chap-11 Data Analysis
22 pages
2-17-Descriptive Inferential Statistics - PT 1 - JA Edit
No ratings yet
2-17-Descriptive Inferential Statistics - PT 1 - JA Edit
49 pages
17.-Typical-Statistical-Testing-Procedures
No ratings yet
17.-Typical-Statistical-Testing-Procedures
29 pages
Types of Statistical Hypothesis: Statistics
No ratings yet
Types of Statistical Hypothesis: Statistics
18 pages
Statistical Treatment
No ratings yet
Statistical Treatment
18 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Fraud Detection Paper English
No ratings yet
Fraud Detection Paper English
19 pages
Intra Firm Comparison Based On Financial Statements
No ratings yet
Intra Firm Comparison Based On Financial Statements
2 pages
Os PDF
No ratings yet
Os PDF
5 pages
Perspective Management Notes by Shreyash
No ratings yet
Perspective Management Notes by Shreyash
59 pages
HP's Accidental Damage Protection Scheme
No ratings yet
HP's Accidental Damage Protection Scheme
2 pages
8051 - Micro Controllers
No ratings yet
8051 - Micro Controllers
33 pages
Jordan Zlatev, Timothy P. Racine, Chris Sinha, Esa Itkonen The Shared Mind Perspectives On Intersubjectivity Converging Evidence in Language and Communication Research Celcr
100% (4)
Jordan Zlatev, Timothy P. Racine, Chris Sinha, Esa Itkonen The Shared Mind Perspectives On Intersubjectivity Converging Evidence in Language and Communication Research Celcr
408 pages
Thesis Uitm Example
100% (3)
Thesis Uitm Example
5 pages
GPS Wikipedia
No ratings yet
GPS Wikipedia
26 pages
TGG Based 1030/1064nm Optical Circulator
No ratings yet
TGG Based 1030/1064nm Optical Circulator
2 pages
Security Principles
No ratings yet
Security Principles
10 pages
Applications of Bent-Up Bars As Shear and Integrity Reinforcement in R/C Slabs
No ratings yet
Applications of Bent-Up Bars As Shear and Integrity Reinforcement in R/C Slabs
4 pages
Self Hypnosis Control
100% (10)
Self Hypnosis Control
54 pages
Special Relativity by David W Hogg
No ratings yet
Special Relativity by David W Hogg
53 pages
Rig Sense
No ratings yet
Rig Sense
7 pages
Mode Participation Factor and Effective Mass: Modal Analysis - Lesson 4
No ratings yet
Mode Participation Factor and Effective Mass: Modal Analysis - Lesson 4
8 pages
2201 Sketchbook Examples
100% (1)
2201 Sketchbook Examples
27 pages
Performing A Distributed Replay With Multiple Clients Using SQL Server 2012 Distributed Replay
No ratings yet
Performing A Distributed Replay With Multiple Clients Using SQL Server 2012 Distributed Replay
17 pages
Verbal Reasoning Test2 Solutions
No ratings yet
Verbal Reasoning Test2 Solutions
12 pages
Strategic managemnet - Google Tài liệu
No ratings yet
Strategic managemnet - Google Tài liệu
6 pages
Introduction To Weka
No ratings yet
Introduction To Weka
38 pages
(HANDOUTS) The Political Self and Being Filipino
No ratings yet
(HANDOUTS) The Political Self and Being Filipino
1 page
The Art of Ordinal Analysis: Michael Rathjen
No ratings yet
The Art of Ordinal Analysis: Michael Rathjen
25 pages
Lesson-Exemplar - Science - Las
No ratings yet
Lesson-Exemplar - Science - Las
9 pages
129 Weldox 1100 Uk Data Sheet
No ratings yet
129 Weldox 1100 Uk Data Sheet
2 pages
Community-Acquired Pneumonia Treatment 2017: Jordi Carratalà
No ratings yet
Community-Acquired Pneumonia Treatment 2017: Jordi Carratalà
28 pages
Sample Resume
No ratings yet
Sample Resume
3 pages
2 Ways To Uninstall Deep Freeze - AvoidErrors
No ratings yet
2 Ways To Uninstall Deep Freeze - AvoidErrors
12 pages
PR 1172 Permit To Work Procedure
No ratings yet
PR 1172 Permit To Work Procedure
155 pages
How To Write An IB Lab Report
No ratings yet
How To Write An IB Lab Report
5 pages
An Organizational Study On Sri Shanmugavel Mills (P) LTD
No ratings yet
An Organizational Study On Sri Shanmugavel Mills (P) LTD
8 pages
Exposition Text
No ratings yet
Exposition Text
9 pages
Pill Bug Behavior Choices
No ratings yet
Pill Bug Behavior Choices
3 pages

Data Analysis - INCOMPLETE - 2

Uploaded by

Data Analysis - INCOMPLETE - 2

Uploaded by

I N T RO D U C T I O N TO

2nd Sem, MCA

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• Main purpose of statistics is to test a hypothesis.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

Mean +/- standard Percentage of

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• CI can be constructed with

• standard error of the sampling distribution = σ / (√n)

• Since size ‘n’ is in denominator and standard deviation ‘s’ is in numerator

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• Step 1: Find mean, μ and standard deviation, σ for the data.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

Correlation Test Selection

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• Observe relationships: correlation helps to identify absence/presence of relationship between two

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

r = Pearson r correlation coefficient between x and y

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

rxy = Pearson r correlation coefficient between x and y

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

o Oi = observed frequency (observed counts in the cells)

o r, c: number of rows, columns in considered dataset (contingency table)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

o H0: There is no relationship between X and Y variable.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

calculated t-value > Critical t-value → reject null hypothesis

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• Compares the means of two sample data (means).

• H0: μ1 = μ2 (there is no difference in sample means)

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

o ̄x1 ̄x2 : sample means

Assuming equal variances in two sample;

o x̄ 1 ̄x2 : sample means

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

• Two histograms are on same scale.

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

degrees of freedom: df=n1+n2−2=10+13−2=21 Alpha = 0.05

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

Test statistic: T = -12.62059

SSS Shameem Mar, 2022 Data Analytics, DSCA, MIT

You might also like