The document discusses the Log-Rank Test, a statistical method for comparing survival distributions among groups, particularly in clinical trials. It explains the test's benefits, such as handling censored data and its non-parametric nature, and provides examples of its application. Additionally, it covers stratification in survival analysis and Fisher's Exact Test for assessing associations between binary variables.
The document discusses the Log-Rank Test, a statistical method for comparing survival distributions among groups, particularly in clinical trials. It explains the test's benefits, such as handling censored data and its non-parametric nature, and provides examples of its application. Additionally, it covers stratification in survival analysis and Fisher's Exact Test for assessing associations between binary variables.
The Team Youssef Sayed Muhamad Shokry Assem Khalifa
Naira Magdy Salam Essam Basel Mohamed
Mahmoud Abdel Fattah Mohamed Anter
Introduction We will talk about What is our goal? What is the survival time? What are the defined functions of survival time? What is the Kaplan-Meier estimator? Lecture Motivation Log-Rank Test What is the Log-Rank Test? The Log-Rank Test is a statistical method used to compare the survival distributions of two or more groups. It assesses whether there are significant differences in survival times among different treatment or intervention groups. The test is widely used in clinical trials, especially in oncology and other medical research fields. Why to Use Log-Rank Test? Time-to-Event Comparisons: Use it when comparing the time it takes for an event (e.g., death, recovery, failure) to occur in different groups. Simplicity: Easy to implement and interpret, making it accessible for researchers. Non-Parametric: Does not assume a specific survival time distribution, making it suitable for various data types. Why to Use Log-Rank Test? Handles Censored Data: accommodates censored observations (e.g., patients lost to follow-up), allowing for comprehensive analysis. Focus on Survival Curves: The most common method for comparing survival curves, helping determine if one group has better survival outcomes than another. Example of Log-Rank Test Clinical Trial: Comparing the survival times of patients treated with Drug A vs. Drug B. Heart Disease Study: Comparing survival rates of patients who underwent surgery vs. those on medication. Smoking and Lung Disease: Comparing survival times of smokers vs. non-smokers after being diagnosed with lung disease. How Does the Log-Rank Test Work? Calculation: It compares the observed number of events (e.g., deaths) in each group to the expected number of events under the null hypothesis of no difference in survival. Example Scenario You’re conducting a clinical trial to compare the survival times of patients receiving Treatment A and Treatment B for a specific type of cancer. You follow the patients for a period of time, recording their survival times and noting if they were censored (meaning the patient either didn’t experience the event by the end of the study or was lost to follow-up). Example Data Calculate Expected Events at Each Time Point Calculate Expected Events at Each Time Point Calculate Expected Events at Each Time Point Calculate Expected Events at Each Time Point Apply Log-Rank Test Formula Apply Log-Rank Test Formula Apply Log-Rank Test Formula Apply Log-Rank Test Formula Apply Log-Rank Test Formula Calculate Total Log-Rank Statistic Conclusion The Log-Rank Statistic (0.476) is compared to a chi-square distribution with 1 degree of freedom. Using statistical tables or software, you would find the p-value.
If the p-value is below a significance level (typically 0.05),
you would reject the null hypothesis, concluding there is a significant difference between the survival curves of Group A and Group B. Otherwise, you fail to reject the null hypothesis, indicating no significant difference. Chi-Square Table Why Use Chi-Square? Chi-Square is a mathematical test used to measure how much the observed events differ from the expected events. Why it’s important: It helps us decide if the difference between groups is just due to chance, or if it’s a real difference. Log-Rank Test in R Log-Rank Test in R Log-Rank Test in R Log-Rank Test in R Stratification in Survival Analysis What is Stratification? Definition: Stratification is the process of dividing a study population into subgroups based on characteristics like age, gender, or risk factors. Purpose: It helps improve analysis accuracy by eliminating the effect of confounding variables that could influence survival outcomes. Example of Stratification Scenario: Group 1: Patients receiving a new treatment. Group 2: Patients receiving a standard treatment.
Objective: Compare the survival curves of these two groups.
Why Do We Stratify? Remove Confounding Effects: Ensures factors like age or treatment are evenly distributed across groups. Increase Precision: Allows for detailed comparison of survival curves within subgroups. Improve Validity: Ensures the difference observed is due to the factor studied and not other variables. Example Data Steps of Analysis Step 1: Stratify Patients by Age We divide the patients into two age groups: Age Group 1: Patients younger than 50 years. Age Group 2: Patients 50 years and older. Step 2: Analyze Age Group 1 (<50 years)
Result: In this age group, patients receiving treatment B show
longer survival times compared to those receiving treatment A. Step 3: Analyze Age Group 2 (≥50 years)
Result: In this age group, it appears that patients receiving
treatment A may have longer survival times compared to those receiving treatment B. Step 4: Stratified Log-Rank Test We use the stratified log-rank test to compare the survival curves for treatments A and B, accounting for the effect of age. Here is the complete R code that implements all the steps: p-value (p = 0.5): A p-value of 0.5 is quite high, suggesting that there is no statistically significant difference in survival between the two treatments (A and B). Final Conclusions for Age Groups Age Group < 50 Years: Treatment B may be more effective. Age Group ≥ 50 Years: Treatment A appears to be more effective. Fisher’s Exact Test Fisher’s Exact Test Definition: A statistical hypothesis test used to assess the association between two binary variables in a contingency table and is particularly useful when working with small sized samples. It is a non-parametric test, meaning it assumes no distribution in the data What is the contingency table? Is a statistical tool that displays the frequency [count] of different combination of two categorical variables. Each cell in the table shows how many times a specific combination occurs What is the contingency table? Expected Cell Counts Fisher's Exact Test is typically used when: At least 20% of the expected cell counts are less than 5. This means that if you have a 2x2 contingency table, at least 1 out of the 4 cells can have an expected count of less than 5. Assumptions Categorical and Binary Variables: Both variables must be categorical and binary, allowing for a 2x2 contingency table. Independence: Data should be randomly selected from independent samples with no relationship between groups, ensuring observations fall into only one category. Assumptions Sample Size: If any cell in the contingency table has a count less than 5, use Fisher's Exact Test; if all counts are 5 or greater, a Chi-squared test is more appropriate. Row and Column Totals: The row and column totals are fixed and not random. Fisher's Exact Test Steps 1.State the null and alternative hypotheses 2.Populate the contingency table and calculate the exact probability The formula for the exact probability (P) of the observed table is Check the condition If the sample size is greater than or equals to 20, We should compute the Expected cell count for each cell, if it is greater than or equals to 5, we then compute the cells ratio for that cell. The cell ratio must be less than 80% of the cells. Expected Cells Counts Formula The Cells Ratio Formula 3.Compute the other possible the contingency tables and their exact probabilities when p1 and P2 is smaller than or equals to P0 we repeat this step twice or until we reach 0 to take the summation of 3 values only. 4.Compute the p-value 5.Conclusion and Decision Example Let’s take an example where we are interested in the association between coffee consumption (binary exposure, whether the people in a study drink coffee, yes/no) and cancer (binary outcome variable, whether people in the study developed cancer, yes/no) with 5% level of significance. Solution Solution Since n is 20, we need to check the Expected Cells Counts and The Cells Ratio Solution Step 1 The null hypothesis (H0): Is that there is no association between coffee consumption and cancer in the study. The alternative hypothesis (H1): Is that there is an association between coffee consumption and cancer. Solution Step 2 Solution Step 3 Solution P1 must be less than or equals to P0, so we will not take this value Solution Solution Solution Step 4 Solution Step 5 Fisher’s Exact test using R Fisher’s Exact test using R THANK YOU
(Ebook) An Introduction to Survival Analysis Using Stata by Mario Cleves, William Gould, Roberto Gutierrez, Yulia Marchenko ISBN 9781597180740, 1597180742 - The full ebook version is just one click away