0% found this document useful (0 votes)

76 views7 pages

CS001-B03 - Exploratory Data Analysis 20

Here are the key points to address in this case: - The sample proportion of customers who made a purchase after receiving the email is 20/100 = 0.2 - To test if this is significantly different from the expected population proportion, a one-sample proportions test (also called a z-test) can be used - The null hypothesis would be that the sample proportion is equal to the expected population proportion - The alternative hypothesis would be that the sample proportion is different than the expected population proportion - Calculate the test statistic and p-value and compare to the significance level (usually α=0.05) to determine if the null hypothesis can be rejected - If the p-value is less than 0.05

Uploaded by

Viswa Spiritual

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views7 pages

CS001-B03 - Exploratory Data Analysis 20

Uploaded by

Viswa Spiritual

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Antern Learning

India

CS001-B03 - Exploratory Data Analysis (20%)

Test Paper

1.5 hours 50 marks

Instructions:

● Read each question carefully before answering. Make sure you understand what is being
asked before you start writing your answer.
● Show your work and explain your reasoning. Even if your final answer is incorrect, you
can still receive partial credit if you show your work and explain your thought process.
● Write clearly and legibly. Make sure your handwriting is neat and easy to read. If the
grader can't read your writing, you may lose points even if your answer is correct.
● Manage your time effectively. Allocate your time wisely to ensure that you have enough
time to answer all the questions. Don't spend too much time on one question at the
expense of other questions.
● Review your answers before submitting your exam. Take a few minutes to check your
work and make sure you have answered all the questions. Make any necessary
corrections or additions before submitting your exam.

SECTION-A

Question 1: Which of the following plots is most suitable for visualizing the relationship
between two numerical variables? (1.5 marks)

a) Scatterplot b) Histogram
c) Boxplot d) Bar chart

Question 2: Which of the following is NOT a measure of central tendency? (1.5 marks)

a) Mean b) Median
c) Mode d) Standard deviation

Question 3: What type of hypothesis test is most appropriate when comparing the means of
more than two groups? (1.5 marks)
a) t-test b) Chi-squared test
c) ANOVA d) Pearson correlation

Question 4: In a positively skewed distribution, which of the following is true? (1.5 marks)

a) Mean < Median < Mode b) Mode < Median < Mean
c) Mean = Median = Mode d) Median < Mean < Mode

Question 5: Which dimensionality reduction technique is most commonly used for

transforming data into a lower-dimensional space while preserving as much variance as
possible? (1.5 marks)

a) Principal Component Analysis (PCA)

b) Independent Component Analysis (ICA)
c) t-Distributed Stochastic Neighbor Embedding (t-SNE)
d) Feature selection using mutual information

Question 6: Which of the following is NOT a common technique for outlier detection? (1.5
marks)

a) Z-score method b) Tukey's fences (IQR method)

c) Box-Cox transformation d) DBSCAN clustering

Assertion-and-Reason Type (3 marks)

Each question consists of two statements,namely,Assertion (A) and Reason (R).For selecting the
correct answer,use the following code:

(a) Both Assertion (A) and Reason (R) are the truth and Reason (R) is a correct explanation of
Assertion (A).
(b) Both Assertion (A) and Reason (R) are true but Reason (R) is not a correct explanation of
Assertion (A).
(c) Assertion (A) is true and Reason (R) is false.
(d) Assertion (A) is false and Reason (R) is true.

Question 1: Assertion: Histograms and boxplots are suitable for visualizing the distribution of
numerical variables.
Reasoning: Imputation methods rely on assumptions about the data to estimate
missing values, which can lead to biased results if the assumptions are
Incorrect.

Question 2: Assertion (A): Histograms and boxplots are suitable for visualizing the distribution
of numerical variables.
Reason (R): Histograms and boxplots provide insights into the central tendency,
dispersion, and shape of the data distribution.

Matching Question: (6 marks)

Question 1: Match the following statistical measures with their corresponding definitions:

A. Mean
B. Variance
C. Standard deviation
D. Mode

1. A measure of central tendency that represents the typical or average value of a dataset.
2. A measure of the spread or variability of a dataset around its mean.
3. A measure of the average deviation of a dataset from its mean.
4. The most frequently occurring value in a dataset.

Question 2: Match the following statistical techniques with their corresponding applications:

A. Boxplot
B. ANOVA
C. Principal Component Analysis (PCA)
D. T-test

1. Used to identify the presence of statistically significant differences in means between two
groups.
2. Used to visualize the distribution, spread, and skewness of a dataset and to identify
outliers.
3. Used to reduce the dimensionality of a dataset by transforming the original features into
a smaller set of new features.
4. Used to test for the presence of statistically significant differences in means across
multiple groups.

Question 3: Match the following statistical measures with their corresponding formulas:

A. Mean
B. Variance
C. Standard deviation
D. Correlation coefficient

i. ∑(xi - x̄)/n
ii. √(∑(xi - x̄)²/(n-1))
iii. ∑(xi - x̄)²/(n-1)
iv. ∑[(xi - x̄)(yi - ȳ)]/[(n-1)sxsy]
Question 4: Match the following statistical techniques with their corresponding advantages:

A. Parametric tests
B. Non-parametric tests
C. Correlation analysis
D. Regression analysis

i. More powerful than nonparametric tests when assumptions are met.

ii. Less sensitive to outliers and non-normality of data.
iii. Used to measure the strength and direction of the linear relationship between two variables.
iv. Used to predict the value of a dependent variable based on the value of one or more
independent variables.

SECTION-B

Inference based questions: ( 6 marks)

1. The average salary of employees in a company is $50,000 with a standard deviation of

$5,000. A random sample of 25 employees is selected from the company and their
average salary is found to be $47,500. What can you infer about the population mean
salary and the reliability of the sample mean?

2. A survey is conducted to determine the average amount of time that people spend
watching TV per week. A sample of 50 people is selected and the mean time spent
watching TV per week is found to be 25 hours with a standard deviation of 4 hours. What
is the 95% confidence interval for the population mean time spent watching TV per
week? What can you infer about the population mean time spent watching TV per week?

3. A new drug is developed to reduce high blood pressure. A clinical trial is conducted to
test the efficacy of the drug. A random sample of 50 patients is selected and the mean
reduction in blood pressure is found to be 10 mmHg with a standard deviation of 2
mmHg. What can you infer about the population mean reduction in blood pressure? Is
the sample mean statistically significant?

Short answer based questions: (18 marks)

1. What is the difference between a categorical variable and a numerical variable? Give an
example of each.

2. What is a scatter plot, and what type of data is it used to visualize? How can you interpret
the relationship between variables in a scatterplot?
3. What is a boxplot, and what type of data is it used to visualize? How can you interpret
the information provided by a boxplot?

4. What is the mean, and how is it calculated? What are some limitations of using the mean
as a measure of central tendency?

5. What is the standard deviation, and how is it calculated? How can you interpret the value
of the standard deviation in relation to the distribution of the data?

6. What is an outlier, and why can it be a problem in data analysis? What are some
common methods used to detect outliers in a dataset?

7. What is the interquartile range (IQR), and how can it be used to detect outliers in a
dataset? How does the IQR differ from the standard deviation as a measure of spread?

8. What is a normal distribution, and why is it important in data analysis? How can you
determine if a dataset follows a normal distribution?

9. What is a skewed distribution, and what are the different types of skewness that can
occur in a dataset? How can you interpret the skewness of a dataset in relation to its
distribution?

10. What is a correlation coefficient, and how is it calculated? What is the range of possible
values for a correlation coefficient, and how can you interpret the strength and direction
of the correlation?

11. What is a scatterplot matrix, and how is it used to visualize correlations between features
in a dataset? What information can you obtain from a scatterplot matrix that you can't
get from individual scatterplots?

12. What is a null hypothesis, and why is it used in hypothesis testing? What is the
alternative hypothesis, and how does it relate to the null hypothesis?

13. What is a t-test, and what type of data is it used to analyze? What is the difference
between a one-sample t-test and a two-sample t-test?

SECTION-C

Case based questions: (14 marks)

Case1:

A company is trying to determine the effectiveness of a new marketing campaign. They

randomly select 100 customers and send them a promotional email. Of these 100 customers, 20
make a purchase. The company wants to know if the proportion of customers who make a
purchase after receiving the email is significantly different from the proportion who make a
purchase without receiving the email. Use a 5% level of significance.

● What is the null hypothesis and alternative hypothesis for this experiment?
● Calculate the sample proportion of customers who made a purchase after receiving the
email.
● Calculate the standard error of the proportion.
● Calculate the test statistic and corresponding p-value.
● Should the null hypothesis be rejected? What is the conclusion of the hypothesis test?

Case 2:

A company is conducting a study to determine if there is a relationship between age and income.
They collect data from 200 people and record their age (in years) and income (in thousands of
dollars). The data is shown below:

Age (years) Income (thousands of dollars)

25 40
30 50
35 60
40 55
45 70
50 65
55 75
60 80

● Calculate the sample mean and sample standard deviation for age and income.
● Create a scatterplot of the data. What is the direction of the relationship between age and
income? Is the relationship linear or nonlinear?
● Calculate the correlation coefficient between age and income. What is the strength and
direction of the correlation?
● Conduct a hypothesis test to determine if there is a significant linear relationship
between age and income. Use a 5% level of significance.
● What is the equation of the regression line for predicting income based on age? What is
the predicted income for someone who is 50 years old?

Case 3:

A hospital is conducting a study to compare the effectiveness of two different treatments for a
certain medical condition. They randomly assign 50 patients to receive Treatment A and 50
patients to receive Treatment B. The results are shown below:

Treatment A Treatment B
10 15
12 14
15 18
16 20
18 17
20 16

● Calculate the sample mean and sample standard deviation for Treatment A and
Treatment B.
● Conduct a two-sample t-test to determine if there is a significant difference in the mean
effectiveness of Treatment A and Treatment B. Use a 5% level of significance.
● What is the null hypothesis and alternative hypothesis for this experiment?
● Calculate the test statistic and corresponding p-value.
● Should the null hypothesis be rejected? What is the conclusion of the hypothesis test?

MCQ For Statistics 621 With Solutions PDF
80% (10)
MCQ For Statistics 621 With Solutions PDF
21 pages
Super Six 2.2
No ratings yet
Super Six 2.2
11 pages
AP Q&A Statistics:With 600 Questions and Answers
From Everand
AP Q&A Statistics:With 600 Questions and Answers
Barron's Educational Series
No ratings yet
Exam-WPS Office
No ratings yet
Exam-WPS Office
13 pages
Revision Questions for Practice (STA2101)
No ratings yet
Revision Questions for Practice (STA2101)
12 pages
Multiple Choice Questions (The Answers Are Provided After The Last Question.)
100% (2)
Multiple Choice Questions (The Answers Are Provided After The Last Question.)
6 pages
Descriptive Statistics
100% (1)
Descriptive Statistics
17 pages
8614 QUIZ
No ratings yet
8614 QUIZ
14 pages
BAM 212
No ratings yet
BAM 212
7 pages
Stats Mcqs Calculations
No ratings yet
Stats Mcqs Calculations
21 pages
Data Science EDA MCQs Document
No ratings yet
Data Science EDA MCQs Document
24 pages
Geo Ma HG Basic Statistics Self Test
No ratings yet
Geo Ma HG Basic Statistics Self Test
9 pages
STA 122 Instruction: Answer 10 Questions From Each of The Five Sections Time Allowed: 40 Minutes
No ratings yet
STA 122 Instruction: Answer 10 Questions From Each of The Five Sections Time Allowed: 40 Minutes
40 pages
Final Paper ANS KEY OFFICIAL (1)
No ratings yet
Final Paper ANS KEY OFFICIAL (1)
15 pages
Business Statistics - Final Exam (70 Questions)
No ratings yet
Business Statistics - Final Exam (70 Questions)
5 pages
Sta 122 Cbe 2016 - 2017-1
No ratings yet
Sta 122 Cbe 2016 - 2017-1
43 pages
IDS 3,4,5. EXCLUSIVE
No ratings yet
IDS 3,4,5. EXCLUSIVE
43 pages
Beniga Ma 102 Pre-Test Exam
50% (2)
Beniga Ma 102 Pre-Test Exam
6 pages
Biostatistics III
No ratings yet
Biostatistics III
18 pages
Exam Question
100% (1)
Exam Question
8 pages
QNT 351 Final Exam Correct Answers 100%
100% (1)
QNT 351 Final Exam Correct Answers 100%
4 pages
Data Analysis
No ratings yet
Data Analysis
7 pages
ASO COMMON SYLLABUS MCQ WITHOUT ANSWER-1
No ratings yet
ASO COMMON SYLLABUS MCQ WITHOUT ANSWER-1
46 pages
Introductory_Statistics_MCQs
No ratings yet
Introductory_Statistics_MCQs
2 pages
TQ - Stat
100% (1)
TQ - Stat
9 pages
Quiz Statekbis 1 2020
No ratings yet
Quiz Statekbis 1 2020
31 pages
Output 2 - Stat-Analysis
No ratings yet
Output 2 - Stat-Analysis
5 pages
1st Midterm - Spring 2019 OSL
100% (1)
1st Midterm - Spring 2019 OSL
10 pages
ON THI SB
No ratings yet
ON THI SB
22 pages
Maths Apt
No ratings yet
Maths Apt
11 pages
MSLR Important Theory Questions
No ratings yet
MSLR Important Theory Questions
17 pages
Complete Answer Guide for Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank
100% (30)
Complete Answer Guide for Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank
38 pages
Stats Practice Exam (1)
No ratings yet
Stats Practice Exam (1)
12 pages
20241 Mock test official (1)
No ratings yet
20241 Mock test official (1)
15 pages
Fbas Quiz Answers
No ratings yet
Fbas Quiz Answers
6 pages
STA 122 CBE past questions
No ratings yet
STA 122 CBE past questions
41 pages
Complete Download of Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank Full Chapters in PDF DOCX
100% (9)
Complete Download of Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank Full Chapters in PDF DOCX
43 pages
Practice Questions QUIZ 1
No ratings yet
Practice Questions QUIZ 1
2 pages
Stat Final
No ratings yet
Stat Final
232 pages
DEV UNIT 3,4 MCQs
No ratings yet
DEV UNIT 3,4 MCQs
6 pages
413D - Business Statistics PDF
No ratings yet
413D - Business Statistics PDF
21 pages
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank pdf download
100% (5)
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank pdf download
34 pages
FSC 840 Test1 Answers
No ratings yet
FSC 840 Test1 Answers
7 pages
Applied Statistics Assignment2
No ratings yet
Applied Statistics Assignment2
7 pages
Statistics MCQ
100% (1)
Statistics MCQ
15 pages
DSA UNIT 2 MCQs
No ratings yet
DSA UNIT 2 MCQs
29 pages
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank instant download
100% (2)
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank instant download
34 pages
Click Here To Download The Answer S
No ratings yet
Click Here To Download The Answer S
38 pages
Review Mid Term Exam 2 Answer Keys
No ratings yet
Review Mid Term Exam 2 Answer Keys
11 pages
Midterm in GE 103
No ratings yet
Midterm in GE 103
9 pages
Midterm Self Tests
No ratings yet
Midterm Self Tests
4 pages
Statistics
No ratings yet
Statistics
53 pages
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
53 pages
MQM100 MultipleChoice Chapter3
100% (2)
MQM100 MultipleChoice Chapter3
21 pages
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank pdf download
100% (5)
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank pdf download
47 pages
Quiz 8614 - Merged Final by Educational Solutions
No ratings yet
Quiz 8614 - Merged Final by Educational Solutions
68 pages
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank - Download Today With Full Content
100% (1)
Discovering Statistics Using IBM SPSS Statistics 5th Edition Field Test Bank - Download Today With Full Content
34 pages
Copy-sta 131&132 Study Questions by Premier (Mr_humble PDF Archive)
No ratings yet
Copy-sta 131&132 Study Questions by Premier (Mr_humble PDF Archive)
9 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Chapter 13 Text PDF
100% (1)
Chapter 13 Text PDF
31 pages
Goodness of Fit Test
No ratings yet
Goodness of Fit Test
16 pages
Hermite Distribution
No ratings yet
Hermite Distribution
8 pages
Random Effects Models: Yanez, Spring 2004 1 Lecture Notes XI
No ratings yet
Random Effects Models: Yanez, Spring 2004 1 Lecture Notes XI
14 pages
Measure of Dispersion Basic Maths 19
No ratings yet
Measure of Dispersion Basic Maths 19
22 pages
Statistics Formula Sheet and Tables 2020
100% (1)
Statistics Formula Sheet and Tables 2020
6 pages
Demand Forecasting and Estimating Methods Problems
No ratings yet
Demand Forecasting and Estimating Methods Problems
22 pages
Business Statistics PDF
No ratings yet
Business Statistics PDF
77 pages
14122024130049
No ratings yet
14122024130049
13 pages
Statistical Analysis
No ratings yet
Statistical Analysis
15 pages
Characteristic Function Exercises
No ratings yet
Characteristic Function Exercises
10 pages
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
No ratings yet
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
17 pages
VARMA For Battery Voltage Forecasting 3
No ratings yet
VARMA For Battery Voltage Forecasting 3
50 pages
Random Variables and Its Probability Distributions
0% (2)
Random Variables and Its Probability Distributions
18 pages
Experiments: Multiple Choice Questions
No ratings yet
Experiments: Multiple Choice Questions
28 pages
Question Paper Code: 17126: Reg. No
No ratings yet
Question Paper Code: 17126: Reg. No
4 pages
Measures of Variability PDF
No ratings yet
Measures of Variability PDF
39 pages
Final Exam
No ratings yet
Final Exam
18 pages
Hypothesis Melted Ice Presentation in Statistic and Probability
No ratings yet
Hypothesis Melted Ice Presentation in Statistic and Probability
13 pages
Base de Dato
No ratings yet
Base de Dato
53 pages
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
No ratings yet
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
31 pages
Missing Values Estimation Comparison in Split-Plot Design
No ratings yet
Missing Values Estimation Comparison in Split-Plot Design
8 pages
Lack of Fit Test
No ratings yet
Lack of Fit Test
5 pages
Methods in Psychological Research 3rd Edition Evans Test Bank
100% (44)
Methods in Psychological Research 3rd Edition Evans Test Bank
12 pages
Econ24 Caetano
No ratings yet
Econ24 Caetano
54 pages
Mathews Paul G. Design of Experiments With MINITAB PDF
100% (3)
Mathews Paul G. Design of Experiments With MINITAB PDF
521 pages
Week 3 - Conditional Probability, Bayes Theorem, Independence of Events
No ratings yet
Week 3 - Conditional Probability, Bayes Theorem, Independence of Events
16 pages
Data Analysis: 4.1 Measurement Model
No ratings yet
Data Analysis: 4.1 Measurement Model
9 pages
SL Paper133
No ratings yet
SL Paper133
18 pages