0% found this document useful (0 votes)

4 views

Individual assignment for data science

The document outlines an individual assignment for a Master's program in Data Science, focusing on statistical methods for analyzing data. It includes various sampling techniques for studying kidney disease prevalence, probability questions related to alcohol consumption, t-tests and ANOVA for comparing test scores, regression analysis for predicting student performance, and discussions on the Central Limit Theorem and properties of good estimators. Additionally, it requires writing the Bernoulli distribution as an exponential family, highlighting the application of statistical concepts in real-world scenarios.

Uploaded by

teshager8922

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Individual assignment for data science

Uploaded by

teshager8922

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Emerald International College

MSc in Data Science (2025)

Individual Assignment (Statistics for Data Science)

1) A hospital in Ethiopia wants to study the prevalence of a rare kidney disease among its
patients. Since testing all patients is too costly and time-consuming, the researchers decide to
use a sampling method. They consider the following options:

1. Simple Random Sampling (SRS): Selecting 200 patients randomly from the hospital
database.
2. Stratified Sampling: Dividing patients into age groups (under 30, 30-50, above 50)
and randomly selecting 100 patients from each group.
3. Cluster Sampling: Choosing 3 departments in the hospital at random and testing all
patients in those departments.
4. Systematic Sampling: Selecting every 10th patient who visits the hospital for a
check-up.

Question:
(a) Which sampling method ensures proportional representation of different age groups?
(b) If the researchers want to minimize the cost of data collection while still getting a
representative sample, which method might be the best choice?
(c) What potential bias could arise if they use cluster sampling?

2) The following table shows the frequency of alcohol consumption by age group among a group of
150 adults surveyed:

a) What is the probability that a randomly selected person is in the 30-49 age group?
b) What is the probability that a randomly selected person has consumed alcohol 50
or more times?
c) Given that a person is in the 18-29 age group, what is the probability that they
have consumed alcohol 10-49 times?
d) What is the probability that a randomly selected person is 50+ years old and has
consumed alcohol 50 or more times?

3) We have two groups (A and B) with different test scores, and we want to check if their means are
significantly different.

# Generate sample data set.seed(123)

# For reproducibility
group_A <- c(75, 80, 85, 78, 82, 77, 83, 79, 81, 76) # Group A scores

group_B <- c(68, 72, 70, 74, 71, 69, 73, 75, 70, 72) # Group B scores # Perform two-sample t-test
t_test_result <- t.test(group_A, group_B, alternative = "two.sided", var.equal = TRUE)

###############output###############

Questions Based on the Output

1. What is the null hypothesis (H₀) for this t-test?

2. What is the t-statistic value?
3. What is the p-value, and what does it indicate about statistical significance at α =
0.05?
4. What are the sample means for group A and group B?
5. What is the 95% confidence interval for the difference in means?
6. Based on the results, should we reject the null hypothesis? Why?

4) We have test scores from three different study methods (A, B, and C), and we want to check if
there is a significant difference in their means.

# Generate sample data set.seed(123) # For reproducibility

group <- rep(c("A", "B", "C"), each = 10) # 3 groups

scores <- c(75, 80, 85, 78, 82, 77, 83, 79, 81, 76, 68, 72, 70, 74, 71, 69, 73, 75, 70, 72,

88, 90, 85, 87, 91, 86, 89, 92, 88, 90)
Questions Based on the Output

1. What is the null hypothesis (H₀) for this ANOVA test?

2. What is the F-statistic value, and what does it indicate?
3. What is the p-value, and what does it suggest about the means of the groups at α =
0.05?

Questions Based on the Output

1. Which group pairs show statistically significant differences?

2. What is the mean difference between groups A and B? Is it statistically significant?
3. If a pairwise comparison has a p-value greater than 0.05, what does it mean in terms
of significance?

5) Let's assume we have a dataset where we predict student test scores based on study
hours and sleep hours.
Questions Based on the Output

1. What is the regression equation based on the model output?

2. What does the coefficient for study_hours (4.7854) indicate about its relationship
with test scores?
3. Is sleep_hours a statistically significant predictor? Why or why not?
4. What is the R-squared value, and what does it tell us about the model’s fit?
5. If another student studies for 8 hours and sleeps for 6 hours, what would the model
predict for their test score?

6) Discuss about Central Limit Theorem

7) Discuss about Properties of Good estimator

8) Write Bernoulli distribution as Exponential Family

PYC3704 - 2021 - 202 - 1 - B ECP Feedback Assignment 2
100% (5)
PYC3704 - 2021 - 202 - 1 - B ECP Feedback Assignment 2
18 pages
AP Q&A Statistics:With 600 Questions and Answers
From Everand
AP Q&A Statistics:With 600 Questions and Answers
Barron's Educational Series
No ratings yet
Daewoo Service Manual
No ratings yet
Daewoo Service Manual
1 page
Sta 226
No ratings yet
Sta 226
5 pages
MCQ 5 - Answers
No ratings yet
MCQ 5 - Answers
5 pages
Aplia PROD SRVMaterialswoody - Hughes 0018 f10 - Test3 - 111510
No ratings yet
Aplia PROD SRVMaterialswoody - Hughes 0018 f10 - Test3 - 111510
5 pages
EMPTY - Practice Test
No ratings yet
EMPTY - Practice Test
12 pages
Assignment 7
No ratings yet
Assignment 7
23 pages
Final Exam 2023 Statistics 2
No ratings yet
Final Exam 2023 Statistics 2
14 pages
Stat 1x1 Final Exam Review Questions (Units 13, 14, 15)
No ratings yet
Stat 1x1 Final Exam Review Questions (Units 13, 14, 15)
8 pages
Statistics Test PDF
100% (1)
Statistics Test PDF
8 pages
SOLVED - Practice Test
No ratings yet
SOLVED - Practice Test
15 pages
Revision Questions
No ratings yet
Revision Questions
32 pages
Unit 5: Hypothesis Testing
No ratings yet
Unit 5: Hypothesis Testing
6 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
60 pages
Stat 101 Final Exam Reviewer
No ratings yet
Stat 101 Final Exam Reviewer
6 pages
Questions For 2nd Midterm Exam
No ratings yet
Questions For 2nd Midterm Exam
5 pages
Mgeb12 23S T21
No ratings yet
Mgeb12 23S T21
22 pages
FSS 204 Lecture 7
No ratings yet
FSS 204 Lecture 7
2 pages
BIOSTAT Lab Discussion Midterm
No ratings yet
BIOSTAT Lab Discussion Midterm
48 pages
Assignment 4 (2) - Engineering Statistics PDF
No ratings yet
Assignment 4 (2) - Engineering Statistics PDF
4 pages
Sample Exam Questions Stats1a
No ratings yet
Sample Exam Questions Stats1a
14 pages
Sample Final Exam 1
No ratings yet
Sample Final Exam 1
13 pages
Faculty of Science, Technology & Environment School of Computing, Information & Mathematical Sciences
No ratings yet
Faculty of Science, Technology & Environment School of Computing, Information & Mathematical Sciences
7 pages
Sample Variance
No ratings yet
Sample Variance
6 pages
Second Periodical Examination in Advanced Statistics
No ratings yet
Second Periodical Examination in Advanced Statistics
3 pages
Example Exam Statistics - 1
No ratings yet
Example Exam Statistics - 1
19 pages
Underline Your Answer To The Following Questions and Show Solution On The Space Provided
No ratings yet
Underline Your Answer To The Following Questions and Show Solution On The Space Provided
5 pages
Stats-Proj Group 2
0% (1)
Stats-Proj Group 2
53 pages
Statistical Inferences Solved Paper
No ratings yet
Statistical Inferences Solved Paper
7 pages
P&S Ass2
No ratings yet
P&S Ass2
4 pages
FINAL EXAM STATISTICS FOR SOCIAL SCIENCES 6.2024 (2)
No ratings yet
FINAL EXAM STATISTICS FOR SOCIAL SCIENCES 6.2024 (2)
8 pages
All Units 2&5marks Qu With Ans
No ratings yet
All Units 2&5marks Qu With Ans
39 pages
Quiz 4 Independent T Test
No ratings yet
Quiz 4 Independent T Test
12 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
33 pages
Practice Problems Final Exam 305
No ratings yet
Practice Problems Final Exam 305
8 pages
Nov Efbs Eks 2022 Exam1 Memo
No ratings yet
Nov Efbs Eks 2022 Exam1 Memo
17 pages
Lecture 07 Test Concerning Variances
No ratings yet
Lecture 07 Test Concerning Variances
12 pages
EFBC2524 Test3 2024 Memo
No ratings yet
EFBC2524 Test3 2024 Memo
18 pages
Midtermtest 158-1
No ratings yet
Midtermtest 158-1
5 pages
Non-Parametric Test
No ratings yet
Non-Parametric Test
12 pages
Transformando La Movilidad Urbana en Mexico2
No ratings yet
Transformando La Movilidad Urbana en Mexico2
4 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
Ba9201 - Statistics For Managementjanuary 2010
100% (1)
Ba9201 - Statistics For Managementjanuary 2010
5 pages
BES220 Sick Nov2022
No ratings yet
BES220 Sick Nov2022
12 pages
(EMPTY) - Practice Test 1.5
No ratings yet
(EMPTY) - Practice Test 1.5
15 pages
Preferred Lunch Drink Teens Adults Total Coffee 245 Tea 250 Soft Drink 400 Other 105 Total 400 600 1000
No ratings yet
Preferred Lunch Drink Teens Adults Total Coffee 245 Tea 250 Soft Drink 400 Other 105 Total 400 600 1000
10 pages
Example Ch101
No ratings yet
Example Ch101
18 pages
Applied Maths-Unit5
No ratings yet
Applied Maths-Unit5
4 pages
ESA- QP_UE19-20CS203_SDS
No ratings yet
ESA- QP_UE19-20CS203_SDS
11 pages
Data Analysis For Managers Unit IV: Chi-Square Test and ANOVA
No ratings yet
Data Analysis For Managers Unit IV: Chi-Square Test and ANOVA
20 pages
334 Exx3
No ratings yet
334 Exx3
7 pages
21bce0427 VL2022230503921 Ast05
No ratings yet
21bce0427 VL2022230503921 Ast05
17 pages
A Crash Course in Statistics - Handouts
No ratings yet
A Crash Course in Statistics - Handouts
46 pages
Problem Set in Statistics
No ratings yet
Problem Set in Statistics
11 pages
Ma6452 Statistics and Numerical Methods Mathematics II
No ratings yet
Ma6452 Statistics and Numerical Methods Mathematics II
14 pages
Unit 6 Review 2016
No ratings yet
Unit 6 Review 2016
4 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
bsii Assignment2
No ratings yet
bsii Assignment2
5 pages
9311-Module 4 Chapter 10 Case Assignment-2.Docx672dda0cd28481305
No ratings yet
9311-Module 4 Chapter 10 Case Assignment-2.Docx672dda0cd28481305
7 pages
STAT501 Online - Spring2024 - FinalExam
No ratings yet
STAT501 Online - Spring2024 - FinalExam
14 pages
Bin Card Dimontration
100% (1)
Bin Card Dimontration
27 pages
Biesse Aggregates
No ratings yet
Biesse Aggregates
106 pages
23 24 CRIT B C Lab Report Instructions MYP
No ratings yet
23 24 CRIT B C Lab Report Instructions MYP
7 pages
8 - CONTINUOUS MINER Specifications
100% (1)
8 - CONTINUOUS MINER Specifications
4 pages
Avogadro's Law Grade 10
100% (1)
Avogadro's Law Grade 10
19 pages
Scope in Programming Languages
No ratings yet
Scope in Programming Languages
14 pages
Diagram Balok Wildan Miftah Yasir 2311181015
No ratings yet
Diagram Balok Wildan Miftah Yasir 2311181015
4 pages
Genese Quartet en
No ratings yet
Genese Quartet en
3 pages
NDI 0013 - Guidelines For Loadouts
No ratings yet
NDI 0013 - Guidelines For Loadouts
35 pages
Tiling
No ratings yet
Tiling
33 pages
CBSE Class 8 Practical Geometry Worksheet
No ratings yet
CBSE Class 8 Practical Geometry Worksheet
3 pages
Aeco Inductivos
No ratings yet
Aeco Inductivos
6 pages
One-Sample Kolmogorov-Smirnov Test: Npar Tests
No ratings yet
One-Sample Kolmogorov-Smirnov Test: Npar Tests
25 pages
Unit 1 - Session 2: What's Your Last Name?
No ratings yet
Unit 1 - Session 2: What's Your Last Name?
11 pages
Bulletin 2
No ratings yet
Bulletin 2
8 pages
Savitribai Phule Pune University S.E. (Civil Engineering) : 2015 Course Semester I
No ratings yet
Savitribai Phule Pune University S.E. (Civil Engineering) : 2015 Course Semester I
50 pages
CPM Project Management
No ratings yet
CPM Project Management
27 pages
4cspl2041 - Introduction to Machine Learning
No ratings yet
4cspl2041 - Introduction to Machine Learning
6 pages
Association Rule Mining-Hands - On
No ratings yet
Association Rule Mining-Hands - On
2 pages
Atoms and Molecules Part - 2
No ratings yet
Atoms and Molecules Part - 2
16 pages
Air Conditioning
No ratings yet
Air Conditioning
38 pages
Dot Diagram and Motion Graph
No ratings yet
Dot Diagram and Motion Graph
2 pages
Chapter Fluid Machinery
No ratings yet
Chapter Fluid Machinery
50 pages
White and Block Box Testing
No ratings yet
White and Block Box Testing
24 pages
Section 13 - Pressure Sewer Force Main Design Guideline
No ratings yet
Section 13 - Pressure Sewer Force Main Design Guideline
4 pages
CAT 2 - PRACTICE - Year 7 practice CAT
No ratings yet
CAT 2 - PRACTICE - Year 7 practice CAT
12 pages
Sistem Proteksi Tegangan 220 Volt Menggunakan Relay KLARSERN KS7311 220V 32-37
No ratings yet
Sistem Proteksi Tegangan 220 Volt Menggunakan Relay KLARSERN KS7311 220V 32-37
6 pages
Time Table - BS-52
No ratings yet
Time Table - BS-52
5 pages
Swas Penal Drawings PDF
No ratings yet
Swas Penal Drawings PDF
92 pages

Individual assignment for data science

Uploaded by

Individual assignment for data science

Uploaded by

Emerald International College

MSc in Data Science (2025)

Individual Assignment (Statistics for Data Science)

# Generate sample data set.seed(123)

Questions Based on the Output

1. What is the null hypothesis (H₀) for this t-test?

# Generate sample data set.seed(123) # For reproducibility

group <- rep(c("A", "B", "C"), each = 10) # 3 groups

1. What is the null hypothesis (H₀) for this ANOVA test?

Questions Based on the Output

1. Which group pairs show statistically significant differences?

1. What is the regression equation based on the model output?

6) Discuss about Central Limit Theorem

7) Discuss about Properties of Good estimator

8) Write Bernoulli distribution as Exponential Family

You might also like