Attachment 1

The document outlines a series of statistical problems involving random variables, Monte Carlo methods, interval-censored data, and hypothesis testing. It includes tasks such as calculating distributions, generating samples, estimating cumulative distribution functions, and implementing the Benjamini and Hochberg algorithm for multiple hypothesis testing. Each problem requires the use of R programming for data analysis and visualization.

Uploaded by

chepkoechronnice13

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Attachment 1

Uploaded by

chepkoechronnice13

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Problem 1.

For two independent random variables Y1 and ) and

), then ). Use this fact to work out
the following problems.

a. Let X1 and X2 be two independent random variables from N(0.5,1). Define X = (X1 +
X2)/2. What is the distribution of X? Based on the distribution of X, find p and q such
that p = P(X > 2.3) and P(X > q) = 0.176 using approximate r functions.

b. Following part (a), we seek to approximate p and q using Monte Carlo method. Run R
code set.seed(0) first. Generate a 1000 by 2 matrix (call it samp) from N(0.5,1). Then
the row means of samp (call it sampx) is a sample of X with size 1000. Approximate p
and q based on the sample sampx.

c. Using similar ideas in b, now we seek to approximate p = P(X1 + ··· + X10 > 5),
1), and 10), where X1,···,X10 are i.i.d.
random variables from Beta(1,2).

Problem 2. Assume you have a function check.prime that can be used to check whether a
given natural number is a prime. [Modify your own code or one of mine posted in the
solution of homework 3].

a. Based on the function check.prime, create a new function that can count the number
of primes smaller than or equal to the given natural number. Name this new function
count.prime.

b. Consider a sequence x= 100*seq(1:500). Let y be a vector with the same length as x

and the ith element of y (i.e., yi) is the count of prime numbers up to the ith element of
x (i.e., xi). Define z = x/log(x). Display the last 20 values of x, y, and z. For example, you
can use the command tail(cbind(x, y, z), 20).

c. Plot y versus x. On the same figure, add a curve z versus x. Distinguish the two curves
with different line types and colors. Add legend too. In a separate figure, plot y/z
versus x. Make comments on the relationship between y and z when x is large.
Search “Prime number theorem” for reference to help make your comments.

Problem 3. Interval-censored data are a special type of survival data. The main feature of
such data is that the failure time T of interest (response variable) is not observed exactly
but is known to fall within some interval (L,R]. For example, HIV status is only detectable by
some laboratory examinations when patients visit clinics. Thus, the HIV infection time is
only known to fall between the last visit time with negative examination result and the first
visit time with positive result. For a specific subject i, if Li = 0, we say Ti is left-censored; if Ri

1
= ∞, Ti is right-censored; otherwise, Ti is interval-censored. So interval-censored data
actually contains a mixture of left-, interval-, and right-censored observations.
In this problem, we consider a breast-cancer data set, in which the failure time is
defined as the time of the occurrence of breast retraction among early breast cancer
patients. Download the file DataBreastCancer.txt and answer the following questions. The
file contains three columns with observations from 94 patients. The first two columns show
the observed interval for the failure time with unit being month. The third column shows
the group indicator taking 1 or 0 (representing two treatments) for each patient.

a. Read the data into R and save the data as databc. [Pay attention to the missing values].
Make sure that databc is numerical.

b. Write a command to replace the NA (missing value in R) with 65 in databc.

c. All of the missing values in the original file appear in the second column and
actuallyrepresent ∞, so the observations with missing values are right-censored
observations. Create a categorical variable named delta, which takes 0, 1, and 2 for
left, interval, and right-censored observations, respectively. Calculate the numbers
and percentages of left, interval, and right-censored observations in this data set.

d. Split the first two columns of databc into two parts (call them data1 and data0) based
on the group indicator (the third column). Find the numbers of observations in data1
and data0, respectively.

e. Download the function turnbull.r and use it to estimate the cumulative distribution
function (CDF) of interval-censored data. Read the explanations (sentences following
#) of input and output. To obtain the CDF estimates for the two groups, use the
following commands ss1=turnbull(data1); ss0=turnbull(data0); Display ss1 and ss0.

f. Plot the estimated CDFs for both groups with type="s" (plotting as step functions) on
the same figure. Distinguish the two curves by using different line types (solid and
dashed) and colors (red and blue). Also add legends to indicate group difference
(“group=1” and “group=0”) for the two curves. Make comments on which group has a
better treatment performance judged by having a lower CDF curve over time.

Problem 4. In common practice of statistics, study investigators use 0.05 as the significance
level for hypothesis testing. If the p-value is smaller than 0.05, then one rejects the null
hypothesis; otherwise one fails to reject the null hypothesis. The probability of type 1 error
(i.e., the probability of rejecting the null hypothesis when it is true) is 0.05 in this case.
In many situations, there involves testing a number (say k) of hypotheses
simultaneously, H0i vs. H1i, for i = 1,···,k, such as gene expression experiments. In such

2
multiple testing problems, one will have inflated overall type 1 error if using 0.05 for each
individual hypothesis testing. To see this point, considering the independent case,

P(reject at least one null hypothesis|all k null hypotheses are true) = 1 − (1 − 0.05)k

becomes large when k is large. Suppose the p-value for each single hypothesis test is
obtained. A widely used approach in such situation is the Benjamini and Hochberg (1995)
algorithm to control the overall type 1 error below 0.05. The procedure is as follows: First
label the p-values in ascending order such that p(1) ≤ p(2) ≤ ··· ≤ p(k) and denote by H(i) the null
hypothesis corresponding to p(i). Second define i0 to be the largest i for which p(i) ≤ 0.05i/k.
The decision rule is to reject all H(i) for i = 1,···,i0.
The purpose of this problem is to write a function named as bh.adjust using the
Benjamini and Hochberg algorithm. The arguments of this function include a numerical
vector pvs of p-values and a significance level α (set its default value equal to 0.05). The
output will be a logic vector having the same length with the pvs, using FALSE (F) to denote
rejecting a specific null hypothesis. For example, (T, F, T) means that the algorithm rejects
the second null hypothesis but fails to reject and the first and the third null hypotheses.
Note that the output should be the final decisions for the series of hypotheses in the original
order based on the the Benjamini and Hochberg (1995) algorithm.
Implement your function bh.adjust(pvs) with the following vectors for pvs. Interpret your
output in words regarding which null hypotheses are rejected and which are not rejected.

• pvs=c(0.04, 0.02)

• pvs=c(0.039, 0.024, 0.35, 0.009, 0.10)

• pvs=c(0.34, 0.015, 0.029, 0.008, 0.49)

• pvs=c(0.34, 0.015, 0.029, 0.008, 0.49, 0.001, 0.72, 0.54)

B11
100% (1)
B11
20 pages
Solution For HW 1
0% (1)
Solution For HW 1
15 pages
F (X) Is Reviewed
No ratings yet
F (X) Is Reviewed
18 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Assignment: 1: X X X e
No ratings yet
Assignment: 1: X X X e
6 pages
HW 4
No ratings yet
HW 4
4 pages
assignment-3
No ratings yet
assignment-3
5 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
Isi Mtech Qror 05
No ratings yet
Isi Mtech Qror 05
20 pages
Assignment MEF 2 2018
No ratings yet
Assignment MEF 2 2018
5 pages
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
No ratings yet
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
4 pages
hw1 PDF
No ratings yet
hw1 PDF
3 pages
HW2 MTH452/552
No ratings yet
HW2 MTH452/552
7 pages
Homeworks
No ratings yet
Homeworks
8 pages
Basic Probability Problems: Author Richard Serfozo
No ratings yet
Basic Probability Problems: Author Richard Serfozo
31 pages
The Mistake Bound Model: 8803 Machine Learning Theory
No ratings yet
The Mistake Bound Model: 8803 Machine Learning Theory
3 pages
Daily log of knowledge
No ratings yet
Daily log of knowledge
6 pages
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Version 1
No ratings yet
Version 1
18 pages
Chi-Square Tests PDF
No ratings yet
Chi-Square Tests PDF
9 pages
COMP 3250 B - Design and Analysis of Algorithms (Advanced Class)
No ratings yet
COMP 3250 B - Design and Analysis of Algorithms (Advanced Class)
2 pages
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Orf523 S24 HW1
No ratings yet
Orf523 S24 HW1
5 pages
R1 Plots
No ratings yet
R1 Plots
20 pages
p4
No ratings yet
p4
4 pages
Male
No ratings yet
Male
9 pages
Ode 45
No ratings yet
Ode 45
8 pages
Solving Equations (Finding Zeros)
No ratings yet
Solving Equations (Finding Zeros)
20 pages
Cce330 (B) L: Matlab Applications in Signals and (Bio) Systems: American University of Science & Technology
No ratings yet
Cce330 (B) L: Matlab Applications in Signals and (Bio) Systems: American University of Science & Technology
7 pages
HW 1
No ratings yet
HW 1
4 pages
MATH 3650 Homework 5
No ratings yet
MATH 3650 Homework 5
2 pages
Orf523 S24 HW3
No ratings yet
Orf523 S24 HW3
4 pages
STATJRF
No ratings yet
STATJRF
52 pages
Math10282 Ex05 - An R Session
No ratings yet
Math10282 Ex05 - An R Session
6 pages
Markov Processes, Lab 2
No ratings yet
Markov Processes, Lab 2
6 pages
NLAFull Notes 22
No ratings yet
NLAFull Notes 22
59 pages
Polynomial Curve Fitting in Matlab
No ratings yet
Polynomial Curve Fitting in Matlab
3 pages
Ad It Ya
No ratings yet
Ad It Ya
5 pages
Python For Wbsu 1st Sem.
No ratings yet
Python For Wbsu 1st Sem.
6 pages
Power Method and Dominant Eigenvalues
No ratings yet
Power Method and Dominant Eigenvalues
2 pages
2024HW2Boot GOF Eng (1)
No ratings yet
2024HW2Boot GOF Eng (1)
4 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Matlab Tutorial
No ratings yet
Matlab Tutorial
10 pages
HWK3_324
No ratings yet
HWK3_324
9 pages
Ridge 3
No ratings yet
Ridge 3
4 pages
lecture_18
No ratings yet
lecture_18
4 pages
HW 4
No ratings yet
HW 4
6 pages
Lab 11 (Last Lab) - Lab 10 and Crib Sheet Are Attached To This Document Problem 11-1
No ratings yet
Lab 11 (Last Lab) - Lab 10 and Crib Sheet Are Attached To This Document Problem 11-1
5 pages
Buckingham's Π Theorem and Dimensional Analysis with Examples
No ratings yet
Buckingham's Π Theorem and Dimensional Analysis with Examples
8 pages
Statistics Presentation 5
No ratings yet
Statistics Presentation 5
49 pages
Task 1
No ratings yet
Task 1
2 pages
Regression in Matrix Form
No ratings yet
Regression in Matrix Form
12 pages
DSCI 303: Machine Learning For Data Science Fall 2020
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
5 pages
Mathematics and Algorithms For Computer Algebra: Examination
No ratings yet
Mathematics and Algorithms For Computer Algebra: Examination
3 pages
MAE 200A - Homework Set #1
No ratings yet
MAE 200A - Homework Set #1
2 pages
Chapter 9 Factorising and DL Using A Factor Base
No ratings yet
Chapter 9 Factorising and DL Using A Factor Base
11 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
General Instructions: Pie Matlab Assessment For Chemical Engineers
No ratings yet
General Instructions: Pie Matlab Assessment For Chemical Engineers
6 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Assignment 1, Climate Change
No ratings yet
Assignment 1, Climate Change
16 pages
Attachment 1(2)
No ratings yet
Attachment 1(2)
2 pages
Attachment 1(3)
No ratings yet
Attachment 1(3)
2 pages
CPCCWF3007 Student Pack V10.0
No ratings yet
CPCCWF3007 Student Pack V10.0
99 pages
HI6006 Final Assessment T2 2024
No ratings yet
HI6006 Final Assessment T2 2024
18 pages
RIIBEF601E - CAC Class Activities
No ratings yet
RIIBEF601E - CAC Class Activities
17 pages
Assignment
No ratings yet
Assignment
10 pages
Module 2 Quarter 2
No ratings yet
Module 2 Quarter 2
5 pages
RESEARCH 4th QUARTER KUYA RIAN JAMES 2122
No ratings yet
RESEARCH 4th QUARTER KUYA RIAN JAMES 2122
4 pages
Medical Statistics An A Z Companion, Second Edition 2nd Edition All Sections Download
100% (5)
Medical Statistics An A Z Companion, Second Edition 2nd Edition All Sections Download
16 pages
HW 4 - Arief Wicaksono
No ratings yet
HW 4 - Arief Wicaksono
10 pages
Lecture Notes Confidence Intervals
No ratings yet
Lecture Notes Confidence Intervals
7 pages
Aff700 1000 230109
No ratings yet
Aff700 1000 230109
9 pages
MATH SBA (2).docx
No ratings yet
MATH SBA (2).docx
20 pages
3 Transformations in Regression: Y X Y X
No ratings yet
3 Transformations in Regression: Y X Y X
13 pages
Statistics Regression Final Project
100% (2)
Statistics Regression Final Project
12 pages
TJC H2 2021 Math Prelim P2 Solutions
No ratings yet
TJC H2 2021 Math Prelim P2 Solutions
19 pages
Society of Actuaries/Casualty Actuarial Society: Exam C Construction and Evaluation of Actuarial Models
No ratings yet
Society of Actuaries/Casualty Actuarial Society: Exam C Construction and Evaluation of Actuarial Models
83 pages
Iiia) Measures of Central Tendency and Dispersion, Moments, Skewness, Kurtosis (1 Marks)
No ratings yet
Iiia) Measures of Central Tendency and Dispersion, Moments, Skewness, Kurtosis (1 Marks)
17 pages
ANOVA Problems
100% (2)
ANOVA Problems
13 pages
Wilcoxon
No ratings yet
Wilcoxon
12 pages
Structural Equation Modelling (SEM) Part 2 of 3
100% (1)
Structural Equation Modelling (SEM) Part 2 of 3
18 pages
The Derivation and Choice of Appropriate Test Statistic (Z, T, F and Chi-Square Test) in Research Methodology
No ratings yet
The Derivation and Choice of Appropriate Test Statistic (Z, T, F and Chi-Square Test) in Research Methodology
9 pages
Statistics and Probability Exam
100% (1)
Statistics and Probability Exam
3 pages
PSYC220 Final Assignment
No ratings yet
PSYC220 Final Assignment
6 pages
Mansci Finals Reviewer
No ratings yet
Mansci Finals Reviewer
3 pages
Content - The Mean and Variance of - ( - Bar (X) - )
No ratings yet
Content - The Mean and Variance of - ( - Bar (X) - )
4 pages
Kuesioner Pola Asuh Scale: All Variables: Case Processing Summary
No ratings yet
Kuesioner Pola Asuh Scale: All Variables: Case Processing Summary
4 pages
Practise Set 5 (2)
No ratings yet
Practise Set 5 (2)
31 pages
Chapter 4 - Linear Regression
100% (2)
Chapter 4 - Linear Regression
25 pages
What Is Sample??: Acceptance Sampling
No ratings yet
What Is Sample??: Acceptance Sampling
15 pages
ASTM E122-07 Sample Size Determination
No ratings yet
ASTM E122-07 Sample Size Determination
14 pages
Statistical Method For Economics QUESTION BANK 2010-11: Bliss Point
No ratings yet
Statistical Method For Economics QUESTION BANK 2010-11: Bliss Point
16 pages
Pengaruh Kepemimpinan Dan Disiplin Kerja Terhadap Kinerja Pegawai Pada Dinas Pendidikan Kota Bengkulu Skripsi
No ratings yet
Pengaruh Kepemimpinan Dan Disiplin Kerja Terhadap Kinerja Pegawai Pada Dinas Pendidikan Kota Bengkulu Skripsi
58 pages

Attachment 1

Uploaded by

Attachment 1

Uploaded by

Problem 1.

For two independent random variables Y1 and ) and

b. Consider a sequence x= 100*seq(1:500). Let y be a vector with the same length as x

b. Write a command to replace the NA (missing value in R) with 65 in databc.

• pvs=c(0.039, 0.024, 0.35, 0.009, 0.10)

• pvs=c(0.34, 0.015, 0.029, 0.008, 0.49)

• pvs=c(0.34, 0.015, 0.029, 0.008, 0.49, 0.001, 0.72, 0.54)

You might also like