Question and Answers
Question and Answers
Hypothesis testing:
Hypothesis testing is one of the important areas of statistical analyses.
Sometimes hypothesis testing is referred to as statistical decision-making process.
In day-to-day situations we are required to take decisions about the population on the basis of sample
information
Hypothesis testing helps us figure out, if our ideas about a population are likely to be true by analysing a
sample from that population.
Statement of Hypothesis
1. A statistical hypothesis is a statement, possibly about a population parameter or its probability
distribution, that is validated based on sample information.
2. Experiments often use random samples instead of the entire population, and inferences drawn from the
samples are generalized to the entire population.
3. Before making inferences about the population, it's crucial to recognize that observed results may be due
to chance. Ruling out the chance factor is essential for accurate inferences
4. The probability of chance occurrence is examined by the null hypothesis (H0), which assumes no
differences and that the samples come from the same population with equal means and standard
deviations
5. The alternative hypothesis (H1) serves as a counter proposition to the null hypothesis, proposing that the
samples belong to different populations, their means estimate different parametric means, and there is a
significant difference between their sample means.
6. The alternative hypothesis (H1) is not directly tested; its acceptance or rejection is determined by the
rejection or retention of the null hypothesis (H0).
7. The probability 'p' of the null hypothesis being correct is assessed by a statistical test. If 'p' is too low, H0 is
rejected, and H1 is accepted, indicating a significant observed difference
8. If 'p' is high, H0 is accepted, suggesting that the observed difference is likely due to chance rather than a
variable factor.
Q.2. Explain Type I and Type II errors, with suitable examples
Ans: No hypothesis test is 100% certain, since the test is based on probability, there is always a chance of
incorrect conclusion. In hypothesis testing, errors in decision making can occur based on sample data, leading
to two types of errors regarding the acceptance or rejection of a null hypothesis.
Type I Error
When the null hypothesis is true, but it is incorrectly rejected.
The probability of making a type1 error is Denoted as 'α' (alpha) which is the level of significance we
set on our hypotheses.
And 0.05 indicates that we are willing to accept a 5% chance that we are wrong when we reject the
null hypothesis
Scenario: Medical Test for a Disease
Type I Error (False Positive):
Situation: A medical test is designed to identify a specific disease.
Type I Error Definition: Declaring a healthy person as having the disease.
Example: The test indicates that a person is positive for the disease (rejecting the null hypothesis of being
healthy), but in reality, the person is disease-free.
Type II Error:
When the null hypothesis is false, but it is incorrectly accepted
The probability of making a type II error is Denoted as 'β' (beta) which depends on the power of the
test.
The lower the chosen level of significance 'p' for rejecting the null hypothesis, the higher the
probability of Type II error.
As 'p' decreases, the probability of Type II error increases. This is associated with a narrowing rejection
region and widening acceptance region.
Type II Error (False Negative):
Situation: Same medical test for the disease.
Type II Error Definition: Declaring a person with the disease as healthy.
Example: The test suggests that a person is negative for the disease (accepting the null hypothesis of being
healthy), but in reality, the person does have the disease.
Q.3. Explain the meaning and concept of levels of significance Describe the steps in setting up the level of
significance
ANS: Levels of significance play a crucial role in research experiments, helping researchers determine whether
observed differences are meaningful or just due to chance. These levels serve as critical points on the
probability scale, separating significant from non-significant differences in statistics.
Significance Levels in Research: In social sciences research, the .05 and .01 levels of significance are commonly
used. These levels indicate the confidence with which researchers either reject or retain a null hypothesis. The
choice of significance level influences the rigor of the study, with .01 being a more stringent criterion compared
to .05.
Interpretation of Significance Levels:
.05 Level:
Amount of Confidence: 95%
Interpretation: If the experiment is repeated 100 times, the obtained mean will fall outside the range of µ ±
1.96 SE on only five occasions. In other words, there is a 95% confidence that the observed difference is not
due to chance.
.01 Level:
Amount of Confidence: 99%
Interpretation: If the experiment is repeated 100 times, the obtained mean will fall outside the range of µ ±
2.58 SE on only one occasion. This level is more rigorous, indicating a higher standard of confidence (99%) in
the observed difference.
Critical Values: The values 1.96 and 2.58 are derived from t tables, considering large samples. The .01 level is a
stricter criterion, requiring a larger critical ratio for null hypothesis rejection compared to the .05 level. Notably,
if a t value is significant at the .01 level, it automatically qualifies as significant at the .05 level, but the reverse
is not always true.
Conclusion: Understanding levels of significance is vital for researchers to make informed decisions about the
meaningfulness of their findings. The choice between .05 and .01 levels reflects the desired level of confidence,
with lower levels indicating a more stringent criterion for accepting research results. Researchers must carefully
consider these levels to draw valid conclusions and contribute to the reliability of scientific knowledge.
STEPS
1. State the null hypothesis and the alternative hypothesis
2. Set the criteria for a decision.
3. Level of significance or alpha level for the hypothesis test
4. Critical Region: This is the region which is composed of extreme sample values that are very unlikely
outcomes, if the null hypothesis is true. The boundaries for the critical region are determined by the alpha
level
5. Collect data and compute sample statistics using the formula given below
z= x- – μ
----------
σx
where
-
x = sample mean
μ = hypothesized population mean
σx = standard error between x and μ
6. Make a decision and write down the decision rule.
7. Z-Score is called a test statistics. The purpose of a test statistics is to determine whether the result of a
research study (the obtained difference) is more than what would be expected by the chance alone.
z = Obtained difference
---------------------------------
Difference due to chance
In practical situations, still other aspects are considered while accepting or rejecting a lot. The risks involved for
both producer and consumer are compared. Then TypeI andType-Il errors are fixed; and a decision is reached.
Q4.Describe the four major statistical techniques for organising the data.
Ans: There are four major statistical techniques for organising the data. These are: i) Classification ii) Tabulation
iii) Graphical Presentation, and iv) Diagrammatical Presentation
Classification:
Arrangement of data in groups based on similarities.
Organizes raw data to draw conclusions
Provides a clearer picture of score information.
Types:
o Ungrouped: Lists scores with tallies for frequency.
o Grouped: Organizes data into classes, showing the number of observations in each class.
Relative Frequency Distribution: Indicates proportion of cases at each score value.
Cumulative Frequency Distribution: Shows number of observations less than a particular value.
Cumulative Relative Frequency Distribution: Expresses cumulative frequency as a proportion of the total
number of cases for each score or class interval.
TABULATION
Frequency distribution can be either in the form of a table or it can be in the form of graph. Tabulation is the
process of presenting the classified data in the form of a table. A tabular presentation of data becomes more
intelligible and fit for further statistical analysis. A table is a systematic arrangement of classified data in row
and columns with appropriate headings and sub-headings. The main components of a table are
Table number: When there is more than one table in a particular analysis a table should be marked with a
number for their reference and identification
Title of the table: Every table should have an appropriate title, which describes the content of the table
Caption: Captions are brief and self-explanatory headings for columns.
Stub: Stubs stand for brief and self-explanatory headings for rows
Body of the table: This is the real table and contains numerical information or data in different cells.
Head note: This is written at the extreme right hand below the title and explains the unit of the
measurements used in the body of the tables
Footnote: This is a qualifying statement which is to be written below the table
Source of data: The source from which data have been taken is to be mentioned at the end of the table.
Q5.Explain point biserial correlation and phicoefficient with suitable exam +1 +1+1
ANS: Point biserial correlation is a special case of Pearsons correlation and examines the relationship between
a dichotomous variable and a continuous variable.
Point Biserial Correlation (rpb) is Pearson’s Product moment correlation between one truly
dichotomous variable and other continuous variable. Algebraically, the rpb = r.
One continues variable and quantitative ( which will take forever to count) (interval or ratio)
One truly or naturally dichotomous variable (Qualitative/ binary variable)
The dichotomous variable is the one that can be divided into two sharply distinguished or mutually
exclusive categories.
A dichotomous variable is a variable with two values eg male female, smoker non smoker
These are the truly dichotomous variables for which no underlying continuous distribution can be
assumed.
We cant apply pearsons since it requires continues variables.
We apply 1 for female representation and 0 for male representation. Or any binary representations to
the dichotomous values
Many different situations call for analysing a link between binary variable and continuous variable.
a) Does Drug A or Drug B works better for depression
b) How many students are passed or failed
c) Are women likely to earn more as nurses
Formula
M1 = mean (for the entire test) of the group that received the
positive binary variable (i.e. the “1”).
M0 = mean (for the entire test) of the group that received the
negative binary variable (i.e. the “0”).
Sn = standard deviation for the entire test.
p = Proportion of cases in the “0” group.
q = Proportion of cases in the “1” group.
PHI coefficient the Pearson's correlation between a dichotomous variable and a continuous variable is known
as the point-biserial correlation. When both variables are dichotomous, the calculated Pearson's correlation is
termed as Phi Coefficient (φ). For instance, consider correlating gender and property ownership. If both
variables have two levels (e.g., male/female and owns/does not own property), calculating the Pearson's
correlation results in the Phi Coefficient (φ), where both variables take values of 0 or 1.
Y=0.3989
34.13% 34.13%
Z=0
8 The Total Area under Normal Curve may be also considered 100 Percent Probability: The total area
under the normal curve may be considered to approach 100 percent probability; interpreted in terms
of standard deviations.
9 The Normal Curve is Bilateral: The 50% area of the curve lies to the left side of the maximum central
ordinate and 50% of the area lies to the right side. Hence the curve is bilateral.
10 The Normal Curve is a mathematical model in behavioural Sciences Specially in Mental
Measurement: This curve is used as a measurement scale. The measurement unit of this scale is ± 1σ
(the unit standard deviation).
In descriptive research within the health sciences, nominal scales are frequently used when gathering
demographic data about target populations, where variables like the presence or absence of pain or agreement
and disagreement are considered.
Example of Nominal Level Measurement:
Does your back problem affect your employment status?
Yes No
Are you limited in how many minutes you are able to walk continuously with or without support (i.e., cane)?
Yes No
Ordinal data
Ordinal data is the second level of measurement, often linked with non-parametric statistics. It provides a
qualitative 'order' of variables in exclusive categories, but it doesn't indicate the value of differences between
positions. For example, in pain scales or stress scales, you can say someone with a score of 5 experiences more
pain than someone with a score of 3, but you can't measure the exact difference. Non-parametric techniques
are used for testing differences between groups and relationships among variables in ordinal data.
Example:
Should there be a war between isreal and palastine
Strongly Agree Agree Neutral Disagree Strongly disagree
Examples include weight, blood pressure, and force. In health science research, multi-item scales are often
used, where individual items can be either nominal or ordinal.
Q8. Elucidate interactional effect. Discuss the merits and demerits of two-way ANOVA
ANS: Interaction or interactional effect refers to the combined impact of two or more independent variables on
a dependent variable. In the context of two-way ANOVA, the consideration and interpretation of these
interaction effects become crucial.
Without considering interaction, the analysis of two-way or three-way ANOVA loses its significance. Interaction
effects help understand how the joint influence of multiple variables affects the dependent variable.
An analogy with the use of two types of fertilizers (Urea and Phosphate) illustrates the concept. While each
fertilizer independently affects crop growth, their combined application in the right ratio may lead to a
significant increase or decrease in crop growth.
Similarly, in psychology and education, two different teaching methods (Treatment A and Treatment B) may
individually impact academic achievement. However, their combined effect (interaction) needs to be
considered to determine whether the impact is enhanced, diminished, or nullified.
the mean values of Extroverts and Introverts having high and low level of anxiety
GROUPS EXTROVERTS M1 INTROVERTS M2 TOTAL MEAN
HIGH ANXIETY 13.60 15.40 14.50
LOW ANXIETY 14.20 12.20 13.20
TOTAL MEAN 13.90 13.80 13.85
Q9. What is Skewness ? Explain the factors causing divergence in normal distribution
ANS: Skewnes: A distribution is said to be “skewed” when the mean and median fall at different points in the
distribution and the balance i.e. the point of center of gravity is shifted to one side or the other to left or right.
In a normal distribution the mean equals, the median exactly and the skewness is of course zero (SK = 0).
There are two types of skewness which appear in the normal curve.
a) Negative Skewness : Distribution said to be skewed negatively or to the left when scores are massed at the
high end of the scale, i.e. the right side of the curve are spread out more gradually toward the low end i.e.
the left side of the curve. In negatively skewed distribution the value of median will be higher than that of
the value of the mean.
NEGETIVE SKEWNESS
Positive Skewness: Distributions are skewed positively or to the right, when scores are massed at the low; i.e.
the left end of the scale, and are spread out gradually toward the high or right end
POSITIVE SKEWNESS
The reasons on why distribution exhibit skewness and kurtosis are numerous and often complex, but a careful
analysis of the data will often permit the common causes of asymmetry. Some of common causes are
Selection of the Sample: The choice of subjects can introduce skewness and kurtosis in score
distributions. Small or biased samples may result in skewness. Additionally, scores from small and
homogeneous groups tend to produce narrow and leptokurtic distributions, while scores from small
and highly heterogeneous groups lead to platykurtic distributions.
Unsuitable or Poorly Made Tests: Inadequate or poorly designed tests can lead to asymmetry in score
distributions. If a test is too easy, scores tend to cluster at the high end, while if the test is excessively
difficult, scores accumulate at the low end of the scale.
The Trait being Measured is Non-Normal: Skewness or Kurtosis or both will appear when there is a
real lack of normality in the trait being measured, e.g. interest, attitude, suggestibility, deaths in a old
age or early childhood due to certain degenerative diseases
Errors in the Construction and Administration of Tests: A poorly constructed test with inadequate
item analysis can result in score distribution asymmetry. Additionally, unclear instructions during test
administration, errors in timing, scoring mistakes, and variations in practice and motivation can
contribute to skewness in the score distribution.
Reliability Assessment:
Standard error aids in determining the reliability of a sample from a large population. Calculating the
reliability of statistics is straightforward and contributes to the overall dependability of the sample.
Estimation of Population Parameters:
The primary objective of the standard error is to estimate population parameters. As no sampling
device guarantees perfect representativeness, the standard error formula establishes parameter limits
within a prefixed confidence interval, providing a more robust foundation for inference.
Feasibility in Research Work:
In situations where the population is unknown or impractical to measure, the standard error makes
estimating population parameters feasible. This not only streamlines the research process but also
proves economical in terms of time, energy, and financial resources.
Sample Size Determination:
The standard error is instrumental in determining the appropriate size for experimental or survey
studies. This application ensures that the selected sample size is sufficient for achieving meaningful
and reliable results.
Significance of Group Differences:
Another valuable application lies in assessing the significance of differences between two groups. By
eliminating sampling or chance errors, the standard error aids in accurately determining the true
significance of observed differences, contributing to the credibility of comparative analyses.
In summary, the standard error of statistics emerges as a versatile tool, not only ensuring the reliability of
samples but also facilitating efficient and economical research practices. Its applications span from estimating
population parameters to guiding sample size decisions and enhancing the precision of comparative analyses,
making it an indispensable component of inferential statistical methodologies.
Q8. Explain step by step procedure for computation of Kruskal Wallis ANOVA with an
example
Ans:
Kruskal wallis test is also known as H test or one way Anova on ranks.
The Kruskal Wallis test is a non-parametric analogue to ANOVA
The Kruskal-Wallis test compares the medians of several (more than two) populations to see
whether they are all the same or not
It is a non parametric test. Which means the observations are not normally distributed.
The sample collected is through randomization
The data are in a rank-order format, since it is the only format in which scores are available
The data have been transformed into a rank-order format from an interval/ratio format.
The k samples are independent of one another
The dependent variable (which is subsequently ranked) is a continuous random variable
The underlying distributions from which the samples are derived are identical in shape
We compare more then two population or one independent variable with more then 2 groups or
levels
1) Rank all the numbers in the entire data set from smallest to largest (using all samples combined);
in the case of ties, use the average of the ranks that the values would have normally been given.
2) Total the ranks for each of the samples; call those totals T1, T2, . . ., Tk, where k is the number of
groups or populations.
Examples:
If you want to find out how exam anxiety affects exams scores
Q9. Define and describe coefficient of correlation. Discuss the characteristics and measures
of correlation
Ans: Correlation refers to a process for establishing whether or not relationships exist between two given
variables. So, through this coefficient, one can get a general idea about whether or not two variables are
related. There are many measures are available for variables which are measured at the ordinal or higher
level of measurement. But still, correlation is the most commonly used approach.
To calculate and interpret correlation coefficients for ordinal and interval level scales.
Methods of correlation will summarize the relationship between two variables in a single number
known as the correlation coefficient. The correlation coefficient is usually shown by the symbol r and
it ranges from -1 to +1.
Visually inspect your plot for a pattern and decide whether there is a linear or non-linear pattern between
variables. A linear pattern means you can fit a straight line of best fit between the data points, while a non-
linear or curvilinear pattern can take all sorts of different shapes, such as a U-shape or a line with a curve.
If all points are close to this line, the absolute value of your correlation coefficient is high.
If these points are spread far from this line, the absolute value of your correlation coefficient is low.
Note that the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation
coefficient doesn’t help you predict how much one variable will change based on a given change in the other,
because two datasets with the same correlation coefficient value can have lines with very different slopes.
Characteristics of Correlation
Correlation Direction:
o Negative Correlation: Coefficient is negative, indicating that as one variable increases,
the other decreases predictably.
o Positive Correlation: Coefficient is positive, meaning both variables tend to move in the
same direction.
Correlation Coefficient Range:
o Coefficients range from -1.00 to +1.00.
o -1.00: Perfect negative relationship (inverse movement).
o +1.00: Perfect positive relationship (similar movement).
o 0.00: No correlation, unpredictable relationship between variables.
Strength of Relationships:
o Larger coefficients indicate stronger relationships.
o Closer to 0.00: Weaker relationship, less predictability.
o Closer to ±1.00: Stronger relationship, more accurate predictability.
MEASURES OF CORRELATION
Parametric Statistics
o Pearson product moment correlation coefficient (Most widely accepted as a single
appropriate statistics for correlation
Non-parametric Statistics
o Spearman’s rank order correlation coefficient: “Spearman Rho”
o Kendall’s Tau:
o Chi Square
Q.9 Discuss the applications of normal distribution curve. Describe divergence from
normality with suitable diagram and discuss the factors causing divergence
ANS: Divergence from Normality:
Type # 1. Skewness:
A distribution is normal when the Mean, Median and Mode coin side together and there is a perfect
balance between the right and left halves of the figure. But when the Mean, Median and Mode fall at
different points in the distribution, and the center of gravity is shifted to one side it is said to be
skewed. In a normal distribution the mean equals the Median-Means.
Mean—Median = 0. So the skewness is ‘0’.
skewness is “a distribution not having equal probabilities above and below the mean.” So in fact
greater the gap between the mean and the median, greater is the skewness.
When in a distribution the scores are massed at the high end of the scale i.e. to the right end and are
spread out more gradually towards the left side at that time the distribution is said to be Negatively
Skewed.
In a negatively skewed distribution the Median is greater than the Mean. So when the skewness is
negative the mean lies to the left of the Median. Similarly when in a distribution the scores are
massed at the low end of the scale i.e. to the left end and are spread out more gradually to the right
side at that time the distribution is said to be Positively Skewed.
In a positively skewed distribution the Median is less than the mean. So when the skewness is
positive the mean lies to the right of the Median.
ANS:
The normal distribution curve, also known as the Gaussian distribution or bell curve, is a
statistical concept that describes the distribution of a continuous random variable.
The normal distribution curve is symmetric and forms a bell-shaped curve when graphed.
The highest point of the curve, known as the mean, is at the center, and the curve extends
equally in both directions.
Central Limit Theorem:
The normal distribution is a fundamental concept in statistics, particularly due to the Central
Limit Theorem. This theorem states that the distribution of the sum (or average) of a large
number of independent, identically distributed random variables approaches a normal
distribution, regardless of the original distribution.
Parameters:
The normal distribution is defined by two parameters: the mean (μ), which locates the
center of the curve, and the standard deviation (σ), which measures the spread or dispersion
of the data. The curve is more peaked and narrow with a smaller standard deviation and
flatter and wider with a larger standard deviation.
68-95-99.7 Rule:
A significant feature of the normal distribution is the empirical rule or the 68-95-99.7 rule.
This rule states that approximately 68% of the data falls within one standard deviation of the
mean, about 95% falls within two standard deviations, and nearly 99.7% falls within three
standard deviations.
Z-Score:
The Z-score, or standard score, is used to quantify the number of standard deviations a data
point is from the mean in a normal distribution. It helps in understanding the relative
position of a data point within the distribution.
Point estimation is a method used in statistics to make an educated guess, or estimate, about a
population parameter based on a sample statistic. In simpler terms, if we want to know something
about a whole group (like the average height of all students), but it's impractical to measure
everyone, we can take a sample and use the average height from that sample as our best guess for
the entire group's average. The value we get from the sample is a single point estimate. It's like
saying, "Okay, from our sample, we think the average for the whole group is probably around this
number."
Interval Estimation: Now, our estimate might not be exactly right because samples can be a bit off.
To be more confident about how close our estimate is, we create something called a "confidence
interval." This is like a range of scores we think the real average (μ) could be. We have lower and
upper limits in this range, and we can say, for example, we are 95% sure the actual average falls
within this range. It helps us have more confidence in our estimate.
Now, here's the thing about point estimates - they're not perfect. There's always a chance they're a
bit off. To account for this uncertainty, we use interval estimation. Instead of just giving one number
as our estimate, we create a range, or interval, of values that we're pretty sure contains the true
value. This range is called a confidence interval. So, if we say the average height from our sample is
160 cm with a 95% confidence interval, it means we're pretty confident that the true average height
for the entire group falls somewhere between, say, 155 cm and 165 cm. This gives us a sense of how
much we can trust our estimate and helps us communicate the uncertainty associated with our
guess.
For example, the researcher is interested in computing the correlation between anxiety and
academic achievement controlled from intelligence. Then correlation between academic
achievement (X) and anxiety (Y) will be controlled for Intelligence (Z). This can be represented as:
rAcademic Achievement(A) Anxiety (B) . Intelligence (C) .
To calculate the partial correlation (rP ) we will need a data on all three variables. The computational
formula is as follows:
Q13. Describe the basic assumptions in testing of significance of difference between two
sample means
ANS:
1. The variable or the trait being measured or studied is normally distributed in the universe.
2. There is no difference in the Means of the two or more populations. i.e. My=Mz If there is a
violation or deviation in the above assumptions in testing the significant difference in the two
means, we can not use “C.R” or “t” test of significance. In such condition, there are other
methods, which are used for the purpose.
3. The samples are drawn from the population using random method of sample selection.
4. The size of the sample drawn from the population is relatively large
For example, consider the correlation between anxiety (A) and academic achievement (B), where
intelligence (C) is a third variable. In partial correlation, we remove the influence of intelligence from
both anxiety and academic achievement. In semi-partial correlation, we only remove the influence of
intelligence from academic achievement.
Q.15 Discuss the step by step procedure for Kendall's • Rank Order Correlation
Ans: The steps for using the Kendall rank order correlation coefficient (τ) are as follows:
Ranking:
o Rank the observations on the X variable from 1 to N.
o Rank the observations on the Y variable from 1 to N.
Ordering:
o Arrange the list of N subjects so that the ranks of the subjects on variable X are in
their natural order (1, 2, 3, … N).
o Observe the Y ranks in the order in which they occur when X ranks are in their
natural order.
Calculate S:
o Determine the value of S, which is the number of agreements in order minus the
number of disagreements in order for the observed order of the Y ranks.
Use of Formula:
o If there are no ties in either X or Y observations, use the formula
o T=2S/(N(N-1)) ,
where S is the score of agreement minus the score of disagreement.
N is the number of objects or individual ranked on both X and Y
If there are ties, modify the formula to account for ties on variables.
T= 2S / [ N (N-1) – Tx T= 2S / [ N (N-1) – Ty
Where: S and N are as above
Tx = Σ t (t – 1), t being the number of tied observations in each group of the ties on the X
variable
T y = Σ t (t – 1), t being the number of tied observation in each group of the ties on the Y
variable
If the N subjects constitute a random sample from some population, one may test the
hypothesis that the variable X and Y are independent in that population. The method for
doing so depends on the size of N:
For N ≤ 10, Table — Upper tail probabilities for T, the Kendall rank order correlation
coefficient
For N > 10, but less than 30, Table – Critical value for T, the Kendall rank order correlation
coefficient
For N < 30 (or for intermediate significance levels for 10 < N ≤ 30) compute the value of z
associated with T by using formula given below and use the z table
z = 3T N (N – 1) / 2 (2N+5)
If the probability yielded by the appropriate method is equal to or less than the critical
value, null hypothesis may be rejected in the favour of alternative hypothesis.
Q16. Discuss frequency distribution in terms of grouped and ungrouped data. Elucidate the
types of frequency distribution
Ans: An ungrouped frequency distribution may be constructed by listing all score values either from
highest to lowest or lowest to highest and placing a tally mark (/) besides each scores every times it
occurs. The frequency of occurrence of each score is denoted by ‘f
Grouped frequency distribution: If there is a wide range of score value in the data, then it is
difficult to get a clear picture of such series of data. In this case grouped frequency
distribution should be constructed to have a clear picture of the data. A group frequency
distribution is a table that organises data into classes
It shows the number of observations from the data set that fall into each of the class.
There are three methods for describing the class limits for distribution: (i) Exclusive method,
(ii) Inclusive method and (iii) True or actual class method.
o Exclusive method In this method of class formation, the classes are so formed that
the upper limit of one class become the lower limit of the next class. In this
classification, it is presumed that score equal to the upper limit of the class is
exclusive, i.e., a score of 40 will be included in the class of 40 to 50 and not in a class
of 30 to 40 (30-40, 40-50, 50-60)
o Inclusive method In this method the classes are so formed that the upper limit of
one class does not become the lower limit of the next class. This classification
includes scores, which are equal to the upper limit of the class. Inclusive method is
preferred when measurements are given in whole numbers. (30-39, 40-49, 50-59)
o True or Actual class method Mathematically, a score is an internal when it extends
from 0.5 units below to 0.5 units above the face value of the score on a continuum.
These class limits are known as true or actual class limits. (29.5 to 39.5, 39.5 to 49.5)
etc
Types of Frequency Distribution
There are various ways to arrange frequencies of a data array based on the requirement of
the statistical analysis or the study. A couple of them are discussed below:
i) Relative frequency distribution: A relative frequency distribution is a distribution that
indicates the proportion of the total number of cases observed at each score value or
internal of score values.
ii) Cumulative frequency distribution: Sometimes investigator may be interested to know the
number of observations less than a particular value. This is possible by computing the
cumulative frequency. A cumulative frequency corresponding to a class-interval is the sum of
frequencies for that class and of all classes prior to that class.
iii) Cumulative relative frequency distribution: A cumulative relative frequency distribution is
one in which the entry of any score of class interval expresses that score’s cumulative
frequency as a proportion of the total number of cases
1. Explain linear and nonlinear relationship with suitable diagrams. Discuss the steps in
computing - Pearson's product moment correlation.
Ans: Linear Relationship One of the basic forms of relationship is linear relationship. Linear
relationship can be expressed as a relationship between two variables that can be plotted as a
straight line. The linear relationship can be expressed in the following equation
Y=á+âX
Y is a dependent variable (variable on y-axis),
á (alpha) is a constant or Y intercept of straight line,
â (beta) is slope of the line and
X is independent variable (variable on x-axis).
Non-linear relationships, such as the Yorkes-Dodson Law and Steven's Power Law, deviate from
straight-line patterns. The Yorkes-Dodson Law highlights the non-linear link between stress and
performance, indicating poor performance at low and high stress levels, but improvement at
moderate stress. Curvilinear relationships, encompassing various types like cubic, quadratic, and
exponential, cannot be represented as straight lines. While this discussion primarily addresses linear
relationships, it's crucial to recognize the existence of diverse relationship types. Stevens' Power Law
exemplifies a non-linear connection between sensation (r) and stimulus (s), expressed as r = csb, with
the potential for conversion into a linear equation through logarithmic transformations