SlideShare a Scribd company logo
Logistic Regression in Case-
Control study using – A
statistical tool
Satish Gupta
What is R?
 The R statistical programming language is a free open
source package.
 The language is very powerful for writing programs.
 Many statistical functions are already built in.
 Contributed packages expand the functionality to
cutting edge research.
Getting Started
 Go to www.r-project.org
 Downloads: CRAN (Comprehensive R Archive
Network)
 Set your Mirror: location close to you.
 Select Windows 95 or later, MacOS or UNIX
platforms
Getting Started
Basic operators and calculations
Comparison operators
 equal: ==
 not equal: !=
 greater/less than: > <
 greater/less than or equal: >= <=
Example: 1 == 1 # Returns TRUE
Basic operators and calculations
Logical operators
 AND: &
x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'.
x > y & x > 5 # Returns TRUE where both comparisons return TRUE.
 OR: |
x == y | x != y # Returns TRUE where at least one comparison is
TRUE.
 NOT: !
!x > y # The '!' sign returns the negation (opposite) of a logical
vector.
Basic operators and calculations
Calculations
 Four basic arithmetic functions: addition, subtraction,
multiplication and division
1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic
calculations.
 Calculations on vectors
x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for
the vector x its sum, mean, standard deviation and square root.
x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element
in the vectors x and y.
R-Graphics
R provides comprehensive graphics utilities for
visualizing and exploring scientific data. It includes:
 Scatter plots
 Line plots
 Bar plots
 Pie charts
 Heatmaps
 Venn diagrams
 Density plots
 Box plots
Data handling in R
 Load data: mydata = read.csv(“/path/mydata.csv”)
 See data on screen: data(mydata)
 See top part of data: head(mydata)
 Specific number of rows and column of data:
mydata[1:10,1:3]
 To get a type of data: class(mydata)
 Changing class of data: newdata = as.matrix(mydata)
 Summary of data: summary(mydata)
 Selecting (KEEPING) variables (columns)
newdata = mydata[c(1,3:5)]
Data handling in R
 Selecting observations
newdata= subset(mydata, age>=20 | age <10,
select=c(ID, weight)
newdata= subset(mydata, sex==“Male” & age >25,
select=weight:income)
 Excluding (DROPPING) variables (columns)
newdata = mydata[c(-3,-5)]
mydata$v3 = NULL
R-Library
 There are many tools defined as “package” are present in R for
different kind of analysis including data from genetics and
genomics.
 Depending upon the availability of library, it can be
downloaded from two sources
Using CRAN (Comprehensive R Archive Network) as:
install.packages(“package_name”)
Using Bioconductor as:
source("http://bioconductor.org/biocLite.R")
biocLite(“package_name”)
R-Library
 To load a package,
library() #Lists all libraries/packages that are available on a system.
library(genetics) #Package for genetics data analysis
library(help=genetics) #Lists all functions/objects of “genetics”
package
?function #Opens documentation of a function
What is Logistic Regression?
 Logistic regression describes the relationship between
a dichotomous response variable and a set of
explanatory variables.
 Logistic regression is often used because the
relationship between the DV (a discrete variable) and
a predictor is non-linear.
 A General Model:
Logistic Regression
JJ
disease
disease
disease XX
p
p
p βββ +++=
−
= 110)
1
log()logit(
Where:
pdisease is the probability that an individual has a particular
disease.
β0 is the intercept
β1, β2 …βJ are the coefficients (effects) of genetic factors
X1, X2 …XJ are the variables of genetic factors
Assumptions
 Logistic regression does not make any assumptions
of normality, linearity, and homogeneity of variance
for the independent variables.
 Because it does not impose these requirements, it is
preferred to discriminant analysis when the data does
not satisfy these assumptions.
Questions ??
 What is the relative importance of each predictor variable?
 How does each predictor variable affect the outcome?
 Does a predictor variable make the solution better or
worse or have no effect?
 Are there interactions among predictors?
 Does adding interactions among predictors
(continuous or categorical) improve the model?
 What is the strength of association between the outcome
variable and a set of predictors?
 Often in model comparison you want non-significant
differences so strength of association is reported for
even non-significant effects.
Types of Logistic Regression
 Unconditional logistic regression
 Conditional logistic regression
** Rule of thumbs
 Use conditional logistic regression if matching has been done,
and unconditional if there has been no matching.
 When in doubt, use conditional because it always gives
unbiased results. The unconditional method is said to
overestimate the odds ratio if it is not appropriate.
Data Format
Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2
1 1 <60 CT TT AG AG
0 1 >60 – 70 CC CC GG GG
1 2 <60 TT CC AG AA
0 2 >70 – 80 CC CT GG GG
1 3 >80 CC CC AA AA
0 3 >60 – 70 CT TT GG GG
1 4 <60 CC CC AA AG
0 4 >70 – 80 TT TT GG GG
1 5 >80 CC CC AG AA
0 5 <60 CC CC GG GG
1 6 >70 – 80 CT TT AA AA
0 6 >80 CC CC GG AG
1 7 >60 – 70 TT CC AA AG
Data and Library loading
 Load and use data in R (Using Lung cancer data from
PLoS One 2013, 8(3):e59051).
lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)
 Load the library and use data for analysis
library(epicalc)
use(lung)
Data Analysis
 Performing conditional logistic regression (Case vs. Control)
clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.074
>70 – 80 0.11(0.03 – 0.33) <0.001
>80 0.10(0.03 – 0.34) <0.001
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
GPX1: ref.=CC 0.032
CT 0.44(0.22 – 0.86) 0.017
TT 0.42(0.13 – 1.38) 0.151
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
 
crude
OR(95%CI)
adj.
OR(95%CI)
P(Wald's
test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042
>70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001
>80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001
GPX1:ref.=CC 0.006
CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004
TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313
Environmental
Factor
Genetic Factor
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.054
>70 – 80 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) <0.001
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=CC 0.034
CT 0.45 (0.24 – 0.85) 0.014
TT 0.44 (0.14 – 1.36) 0.156
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
crude
OR(95%CI)
adj.
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074
>70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001
GPX1:ref.=CC 0.024
CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01
TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
Something More 
 Changing the default reference
GPX1 = relevel(GPX1, ref = "TT")
pack()
 Saving the result
result = clogistic.display(clogit_lung)
write.csv(result$table, file=“path/result.csv“, sep = “t”)
write.table(result$table, file=“path/result.xls“, sep = “t”)
Summary: regression models
 Regression models can be used to describe the
average effect of predictors on outcomes in your data
set.
 They can tell how likely that the effect is just be due
to chance.
 They can look at each predictor “adjusting for” the
others (estimating what would happen if all others
were held constant.)
Thanks to,
Prof. Virasakdi Chongsuvivatwong
Epidemiology Unit,
Faculty of Medicine,
Prince of Songkla University, Thailand
Ad

More Related Content

What's hot (20)

Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
University of Southampton
 
Estimating risk
Estimating riskEstimating risk
Estimating risk
Tarek Tawfik Amin
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
numanmunir01
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
CIToolkit
 
Association and causation
Association and causationAssociation and causation
Association and causation
Aparna Chaudhary
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
Har Jindal
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Setia Pramana
 
Attributable risk and population attributable risk
Attributable risk and population attributable riskAttributable risk and population attributable risk
Attributable risk and population attributable risk
Abino David
 
Epidemiological statistics
Epidemiological statisticsEpidemiological statistics
Epidemiological statistics
Garima Aggarwal
 
How to write a paper statistics
How to write a paper statisticsHow to write a paper statistics
How to write a paper statistics
Amany El-seoud
 
Survival Analysis Using SPSS
Survival Analysis Using SPSSSurvival Analysis Using SPSS
Survival Analysis Using SPSS
Nermin Osman
 
Modern epidemiology
Modern epidemiologyModern epidemiology
Modern epidemiology
UE
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
nszakir
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
DrZahid Khan
 
Density Function | Statistics
Density Function | StatisticsDensity Function | Statistics
Density Function | Statistics
Transweb Global Inc
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
Carlo Magno
 
Introduction to scoping reviews
Introduction to scoping reviewsIntroduction to scoping reviews
Introduction to scoping reviews
Rizwan S A
 
5. Non parametric analysis
5. Non parametric analysis5. Non parametric analysis
5. Non parametric analysis
Razif Shahril
 
Mortality rates & standardization
Mortality rates &  standardizationMortality rates &  standardization
Mortality rates & standardization
Vaishnavi Madhavan
 
Stat 3203 -multphase sampling
Stat 3203 -multphase samplingStat 3203 -multphase sampling
Stat 3203 -multphase sampling
Khulna University
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
numanmunir01
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
CIToolkit
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
Har Jindal
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Setia Pramana
 
Attributable risk and population attributable risk
Attributable risk and population attributable riskAttributable risk and population attributable risk
Attributable risk and population attributable risk
Abino David
 
Epidemiological statistics
Epidemiological statisticsEpidemiological statistics
Epidemiological statistics
Garima Aggarwal
 
How to write a paper statistics
How to write a paper statisticsHow to write a paper statistics
How to write a paper statistics
Amany El-seoud
 
Survival Analysis Using SPSS
Survival Analysis Using SPSSSurvival Analysis Using SPSS
Survival Analysis Using SPSS
Nermin Osman
 
Modern epidemiology
Modern epidemiologyModern epidemiology
Modern epidemiology
UE
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
nszakir
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
DrZahid Khan
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
Carlo Magno
 
Introduction to scoping reviews
Introduction to scoping reviewsIntroduction to scoping reviews
Introduction to scoping reviews
Rizwan S A
 
5. Non parametric analysis
5. Non parametric analysis5. Non parametric analysis
5. Non parametric analysis
Razif Shahril
 
Mortality rates & standardization
Mortality rates &  standardizationMortality rates &  standardization
Mortality rates & standardization
Vaishnavi Madhavan
 
Stat 3203 -multphase sampling
Stat 3203 -multphase samplingStat 3203 -multphase sampling
Stat 3203 -multphase sampling
Khulna University
 

Viewers also liked (15)

ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明
Siao-min (Eric) Pan
 
Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
amsantac
 
Confounder and effect modification
Confounder and effect modificationConfounder and effect modification
Confounder and effect modification
Al-YAQIN DIAGNOSTIC ULTRASONIC CLINIC BAGHDAD
 
手把手教你 R 語言分析實務
手把手教你 R 語言分析實務手把手教你 R 語言分析實務
手把手教你 R 語言分析實務
Helen Afterglow
 
R統計軟體簡介
R統計軟體簡介R統計軟體簡介
R統計軟體簡介
Person Lin
 
Bias and confounding
Bias and confoundingBias and confounding
Bias and confounding
Ikram Ullah
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
Aneel Raza
 
Dummy variable
Dummy variableDummy variable
Dummy variable
Akram Ali
 
CM KaggleTW Share
CM KaggleTW ShareCM KaggleTW Share
CM KaggleTW Share
志明 陳
 
R programming
R programmingR programming
R programming
Shantanu Patil
 
Antenatal care
Antenatal careAntenatal care
Antenatal care
Meklelle university
 
Variables
VariablesVariables
Variables
Hiba Armouche
 
Variables
 Variables Variables
Variables
shoffma5
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
saba khan
 
SAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSSAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORS
rambhu21
 
Ad

Similar to Logistic Regression in Case-Control Study (20)

7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
Dr Nisha Arora
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
Raman Kannan
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
GairuzazmiMGhani
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
Hamideh Iraj
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
ABHISHEKDAHALE
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
zeer1234
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
zeer1234
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
Prasanthperceptron
 
spss teaching
spss teachingspss teaching
spss teaching
lalit pratpa singh singh
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
Shiwani Gupta
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
Akhilesh Joshi
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
Adrian Olszewski
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
efrembeyene4
 
working with python
working with pythonworking with python
working with python
bhavesh lande
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
gadissaassefa
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
logistic regression.............................................................
logistic regression.............................................................logistic regression.............................................................
logistic regression.............................................................
muhammadbsee749
 
logisticregressionJeffWitnerMarch2016.ppt
logisticregressionJeffWitnerMarch2016.pptlogisticregressionJeffWitnerMarch2016.ppt
logisticregressionJeffWitnerMarch2016.ppt
ssuser69ff25
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
Dr Nisha Arora
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
Raman Kannan
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
GairuzazmiMGhani
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
Hamideh Iraj
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
ABHISHEKDAHALE
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
zeer1234
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
zeer1234
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
Akhilesh Joshi
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
Adrian Olszewski
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
efrembeyene4
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
gadissaassefa
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
logistic regression.............................................................
logistic regression.............................................................logistic regression.............................................................
logistic regression.............................................................
muhammadbsee749
 
logisticregressionJeffWitnerMarch2016.ppt
logisticregressionJeffWitnerMarch2016.pptlogisticregressionJeffWitnerMarch2016.ppt
logisticregressionJeffWitnerMarch2016.ppt
ssuser69ff25
 
Ad

Recently uploaded (20)

YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 

Logistic Regression in Case-Control Study

  • 1. Logistic Regression in Case- Control study using – A statistical tool Satish Gupta
  • 2. What is R?  The R statistical programming language is a free open source package.  The language is very powerful for writing programs.  Many statistical functions are already built in.  Contributed packages expand the functionality to cutting edge research.
  • 3. Getting Started  Go to www.r-project.org  Downloads: CRAN (Comprehensive R Archive Network)  Set your Mirror: location close to you.  Select Windows 95 or later, MacOS or UNIX platforms
  • 5. Basic operators and calculations Comparison operators  equal: ==  not equal: !=  greater/less than: > <  greater/less than or equal: >= <= Example: 1 == 1 # Returns TRUE
  • 6. Basic operators and calculations Logical operators  AND: & x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'. x > y & x > 5 # Returns TRUE where both comparisons return TRUE.  OR: | x == y | x != y # Returns TRUE where at least one comparison is TRUE.  NOT: ! !x > y # The '!' sign returns the negation (opposite) of a logical vector.
  • 7. Basic operators and calculations Calculations  Four basic arithmetic functions: addition, subtraction, multiplication and division 1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic calculations.  Calculations on vectors x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for the vector x its sum, mean, standard deviation and square root. x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element in the vectors x and y.
  • 8. R-Graphics R provides comprehensive graphics utilities for visualizing and exploring scientific data. It includes:  Scatter plots  Line plots  Bar plots  Pie charts  Heatmaps  Venn diagrams  Density plots  Box plots
  • 9. Data handling in R  Load data: mydata = read.csv(“/path/mydata.csv”)  See data on screen: data(mydata)  See top part of data: head(mydata)  Specific number of rows and column of data: mydata[1:10,1:3]  To get a type of data: class(mydata)  Changing class of data: newdata = as.matrix(mydata)  Summary of data: summary(mydata)  Selecting (KEEPING) variables (columns) newdata = mydata[c(1,3:5)]
  • 10. Data handling in R  Selecting observations newdata= subset(mydata, age>=20 | age <10, select=c(ID, weight) newdata= subset(mydata, sex==“Male” & age >25, select=weight:income)  Excluding (DROPPING) variables (columns) newdata = mydata[c(-3,-5)] mydata$v3 = NULL
  • 11. R-Library  There are many tools defined as “package” are present in R for different kind of analysis including data from genetics and genomics.  Depending upon the availability of library, it can be downloaded from two sources Using CRAN (Comprehensive R Archive Network) as: install.packages(“package_name”) Using Bioconductor as: source("http://bioconductor.org/biocLite.R") biocLite(“package_name”)
  • 12. R-Library  To load a package, library() #Lists all libraries/packages that are available on a system. library(genetics) #Package for genetics data analysis library(help=genetics) #Lists all functions/objects of “genetics” package ?function #Opens documentation of a function
  • 13. What is Logistic Regression?  Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables.  Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear.
  • 14.  A General Model: Logistic Regression JJ disease disease disease XX p p p βββ +++= − = 110) 1 log()logit( Where: pdisease is the probability that an individual has a particular disease. β0 is the intercept β1, β2 …βJ are the coefficients (effects) of genetic factors X1, X2 …XJ are the variables of genetic factors
  • 15. Assumptions  Logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables.  Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions.
  • 16. Questions ??  What is the relative importance of each predictor variable?  How does each predictor variable affect the outcome?  Does a predictor variable make the solution better or worse or have no effect?  Are there interactions among predictors?  Does adding interactions among predictors (continuous or categorical) improve the model?  What is the strength of association between the outcome variable and a set of predictors?  Often in model comparison you want non-significant differences so strength of association is reported for even non-significant effects.
  • 17. Types of Logistic Regression  Unconditional logistic regression  Conditional logistic regression ** Rule of thumbs  Use conditional logistic regression if matching has been done, and unconditional if there has been no matching.  When in doubt, use conditional because it always gives unbiased results. The unconditional method is said to overestimate the odds ratio if it is not appropriate.
  • 18. Data Format Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2 1 1 <60 CT TT AG AG 0 1 >60 – 70 CC CC GG GG 1 2 <60 TT CC AG AA 0 2 >70 – 80 CC CT GG GG 1 3 >80 CC CC AA AA 0 3 >60 – 70 CT TT GG GG 1 4 <60 CC CC AA AG 0 4 >70 – 80 TT TT GG GG 1 5 >80 CC CC AG AA 0 5 <60 CC CC GG GG 1 6 >70 – 80 CT TT AA AA 0 6 >80 CC CC GG AG 1 7 >60 – 70 TT CC AA AG
  • 19. Data and Library loading  Load and use data in R (Using Lung cancer data from PLoS One 2013, 8(3):e59051). lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)  Load the library and use data for analysis library(epicalc) use(lung)
  • 20. Data Analysis  Performing conditional logistic regression (Case vs. Control) clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.074 >70 – 80 0.11(0.03 – 0.33) <0.001 >80 0.10(0.03 – 0.34) <0.001
  • 21. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) GPX1: ref.=CC 0.032 CT 0.44(0.22 – 0.86) 0.017 TT 0.42(0.13 – 1.38) 0.151
  • 22. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung)   crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042 >70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001 >80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001 GPX1:ref.=CC 0.006 CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004 TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313 Environmental Factor Genetic Factor
  • 23. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.054 >70 – 80 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) <0.001
  • 24. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=CC 0.034 CT 0.45 (0.24 – 0.85) 0.014 TT 0.44 (0.14 – 1.36) 0.156
  • 25. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074 >70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001 GPX1:ref.=CC 0.024 CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01 TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
  • 26. Something More   Changing the default reference GPX1 = relevel(GPX1, ref = "TT") pack()  Saving the result result = clogistic.display(clogit_lung) write.csv(result$table, file=“path/result.csv“, sep = “t”) write.table(result$table, file=“path/result.xls“, sep = “t”)
  • 27. Summary: regression models  Regression models can be used to describe the average effect of predictors on outcomes in your data set.  They can tell how likely that the effect is just be due to chance.  They can look at each predictor “adjusting for” the others (estimating what would happen if all others were held constant.)
  • 28. Thanks to, Prof. Virasakdi Chongsuvivatwong Epidemiology Unit, Faculty of Medicine, Prince of Songkla University, Thailand

Editor's Notes

  • #15: Coeffcients are calculated my MLE
  • #21: In order to test hypotheses in logistic regression, we have used the likelihood ratio test and the Wald test.
  • #22: If the confidence interval includes 0 we can say that there is no significant difference between the means of the two populations, at a given level of confidence. The width of the confidence interval gives us some idea about how uncertain we are about the difference in the means. A very wide interval may indicate that more data should be collected before anything definite can be said. A confidence interval that includes 1.0 means that the association between the exposure and outcome could have been found by chance alone and that the association is not statistically significant.
  • #26: Binomial is specifying a choice of variance and link functions. Variance is binomial and link is logit function.