Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
This document discusses correlation and regression analysis. It defines correlation as measuring the strength and direction of the linear relationship between two variables, without implying causation. Regression is used to predict the value of one variable based on the other. Key points include:
- Correlation is measured on a scale from -1 to 1. Higher positive or negative values indicate a stronger linear relationship.
- Scatter plots visually show the relationship between variables.
- The correlation coefficient quantifies the correlation but does not prove causation. There may be alternative explanations like a third hidden variable.
- Correlation only applies to linear relationships. Curvilinear patterns could show no correlation despite a relationship.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
37 views
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
This document discusses correlation and regression analysis. It defines correlation as measuring the strength and direction of the linear relationship between two variables, without implying causation. Regression is used to predict the value of one variable based on the other. Key points include:
- Correlation is measured on a scale from -1 to 1. Higher positive or negative values indicate a stronger linear relationship.
- Scatter plots visually show the relationship between variables.
- The correlation coefficient quantifies the correlation but does not prove causation. There may be alternative explanations like a third hidden variable.
- Correlation only applies to linear relationships. Curvilinear patterns could show no correlation despite a relationship.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34
Prof. Dr.
Moustapha Ibrahim Salem
[email protected] 01005857099 Correlation • Development of behavioral statistics • Development of the basis of correlati on and regression • Father of Eugenics • Science of fingerprints as unique • Retrospective IQ of 200 Sir Francis Galton )Cousin to Darwin( • Drove himself mad just to prove you could do it • Invented the pocket • Much of statistics is concerned with relationsh ips among variables and whether • observed relationships are real or simply due t o chance. • In particular, the simplest case deals with the r elationship between two variables. • When analyzing two variables, one question b ecomes important as it determines the type o f analysis that will be done. • Is the purpose to explore the nature of t he relationship, or is the purpose to use one variable to explain variation in anot her variable? • For example, there is a difference betwee n examining height and weight to see if t here is a strong relationship, as opposed to using height to predict weight. • Consequently, you need to distinguish between a correla tional analysis in which only the strength of the relations hip will be described, and regression where one variable will be used to predict the values of a second variable. • The two variables are often called either a response vari able or an explanatory variable. • A response variable (also known as a dependent or Y vari able) measures the outcome of a study. • An explanatory variable (also known as an independent or X variable) is the variable that attempts to explain the observed outcomes. Correlation ?Do Variables Relate to One Another Is worker’s pay related to Positive performance? Is exercise related to illness? Negative Is CO2 related to global Positive warming? Is TV viewing related to shoe Zero size? Scatter Plots and Correlat ion • A scatter plot (or scatter diagram) is used to s how the relationship between two variables. • Correlation analysis is used to measure stren gth of the association (linear relationship) be tween two variables. Only concerned with strength of the relationship No causal effect is implied Linear Correlation Linear Correlation Correlation Coefficient • The population correlation coefficient ρ (rho) measures the strength and direction of the lin ear association between the variables • The sample correlation coefficient r is an esti mate of ρ and is used to measure the strength and direction of the linear relationship in the s ample observations Correlation Coefficient • The formula for r is: Correlation Coefficient • A statistic that quantifies a relation between two vari ables • Has no units • Can be either positive or negative • Falls between -1.00 and 1.00 • The value of the number (not the sign) indicates the s trength of the relation • ρ and r are unaffected by linear transformations of th e individual variables, e.g. unit changes such as conve rting from imperial to metric units. Correlation Coefficient Correlation Coefficient Calculating a Correlation Coefficie nt Correlation Coefficient Correlation Coefficient Correlation Coefficient Correlation Coefficient Interpreting Correlations Cautions • Random Sampling Required. • Sample correlation coefficients are only valid under si mple random samples. If the data were collected in a haphazard fashion or if certain data points were over sampled, then the correlation coefficient may be sev erely biased. • There are examples of high correlation but no practic al use and low correlation but great practical use. Cautions • Correlation measures ‘strengt h’ of a linear relationship; a cu rvilinear relationship may have a correlation of 0, but there wil l still be a good correlation. • Watch for outliers and high lev erage points Cautions • Effects of lurking variables. • For example, suppose there is a positive association b etween wages of male nurses and years of experienc e; between female nurses and years of experience; b ut males are generally paid more than females. • There is a positive correlation within each group, but an overall negative correlation when the data are po oled together. Cautions • Correlation does not imply causation. • This is the most frequent mistake made b y people. There are set of principles of ca usal inference that need to be satisfied in order to imply cause and effect Correlation and Causation • The fact that two variables are strongly correla ted does not in itself imply a cause and- effect relationship between the variables. • If there is a significant correlation between tw o variables, you should consider the following possibilities Correlation and Causation 1. Is there a direct cause-and-effect relationship between the variable s? • Does x cause y? 2. Is there a reverse cause-and-effect relationship between the variabl es? • Does y cause x? 3. Is it possible that the relationship between the variables can be cau sed by a third variable or by a combination of several other variables? 4. Is it possible that the relationship between two variables may be a coincidence? Establishing cause-and eff ect • It is generally agreed that most or all of the foll owing must be considered before causation ca n be declared. • Strength of the association. The stronger an o bserved association appears over a series of di fferent studies, the less likely this association i s spurious because of bias. Establishing cause-and eff ect • Dose-response effect • The value of the response variable changes in a mean ingful way with the dose (or level) of the suspected c ausal agent. • Lack of temporal ambiguity • The hypothesized cause precedes the occurrence of t he effect. The ability to establish this time pattern wil l depend upon the study design used Establishing cause-and eff ect • Consistency of the findings • Most, or all, studies concerned with a given causal hypoth esis produce similar findings. Of course, studies dealing w ith a given question may all have serious bias problems th at can diminish the importance of observed associations. • Specificity of the association • The observed effect is associated with only the suspected cause (or few other causes that can be ruled out) Establishing cause-and eff ect • Engineering or theoretical plausibility • The hypothesized causal relationship is consistent with current engineering or • theoretical knowledge. Note, that the current state of k nowledge may be insufficient to explain certain findings • Coherence of the evidence • The findings do not seriously conflict with accepted fact s about the outcome variable being studied End of Chapter 4