0% found this document useful (0 votes)
30 views

Question and Answers

The document discusses inferential statistics and hypothesis testing. It defines inferential statistics as using a sample to draw conclusions about a population. Hypothesis testing is described as a statistical decision making process used to determine if ideas about a population are likely true by analyzing a sample. Type I and Type II errors are explained as incorrect rejections or acceptances of the null hypothesis when sampling. Levels of significance are levels on the probability scale used to determine if differences are meaningful or due to chance, with 0.05 and 0.01 being commonly used levels.

Uploaded by

shaila colaco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Question and Answers

The document discusses inferential statistics and hypothesis testing. It defines inferential statistics as using a sample to draw conclusions about a population. Hypothesis testing is described as a statistical decision making process used to determine if ideas about a population are likely true by analyzing a sample. Type I and Type II errors are explained as incorrect rejections or acceptances of the null hypothesis when sampling. Levels of significance are levels on the probability scale used to determine if differences are meaningful or due to chance, with 0.05 and 0.01 being commonly used levels.

Uploaded by

shaila colaco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Question and answers

Q1. Explain the meaning of inferential statistics. Describe hypothesis testing.


Ans:
1. In most of the scientific investigations a sample, a small portion of the population under investigation,
is used for the study.
2. On the basis of the information contained in the sample we try to draw conclusions about the
population. This process is known as statistical inference
3. Inferential statistics deals with drawing of conclusions about large group of individuals ( population)
on the basis of observation of a few participants from among them or about the events which are yet
to occur on the basis of past events
4. It provides tools to compute the probabilities of future behaviour of the subjects
5. Inferential statistics is the mathematics and logic of how this generalisation from sample to population
can be made
6. Statistical inference treats two different classes of inferential procedures
(1) Estimation, (2) Hypothesis testing
7. Estimation is to use the statistics obtained from the sample as an estimate of the unknown parameter
of the population from which the sample is drawn
8. Hypothesis testing is one of the important areas of statistical analyses. hypothesis testing is referred to
as statistical decision-making process.
9. When data is collected by sampling the most important objective of statistical analysis is to drawn
inference.
10. Statistical estimation, is concerned with methods by which population characteristics are estimated
from the sample information. While hypothesis testing is to test if our ideas of a population are likely
to be true by analysing a sample from the population.

Hypothesis testing:
Hypothesis testing is one of the important areas of statistical analyses.
Sometimes hypothesis testing is referred to as statistical decision-making process.
In day-to-day situations we are required to take decisions about the population on the basis of sample
information
Hypothesis testing helps us figure out, if our ideas about a population are likely to be true by analysing a
sample from that population.
Statement of Hypothesis
1. A statistical hypothesis is a statement, possibly about a population parameter or its probability
distribution, that is validated based on sample information.
2. Experiments often use random samples instead of the entire population, and inferences drawn from the
samples are generalized to the entire population.
3. Before making inferences about the population, it's crucial to recognize that observed results may be due
to chance. Ruling out the chance factor is essential for accurate inferences
4. The probability of chance occurrence is examined by the null hypothesis (H0), which assumes no
differences and that the samples come from the same population with equal means and standard
deviations
5. The alternative hypothesis (H1) serves as a counter proposition to the null hypothesis, proposing that the
samples belong to different populations, their means estimate different parametric means, and there is a
significant difference between their sample means.
6. The alternative hypothesis (H1) is not directly tested; its acceptance or rejection is determined by the
rejection or retention of the null hypothesis (H0).

7. The probability 'p' of the null hypothesis being correct is assessed by a statistical test. If 'p' is too low, H0 is
rejected, and H1 is accepted, indicating a significant observed difference
8. If 'p' is high, H0 is accepted, suggesting that the observed difference is likely due to chance rather than a
variable factor.
Q.2. Explain Type I and Type II errors, with suitable examples
Ans: No hypothesis test is 100% certain, since the test is based on probability, there is always a chance of
incorrect conclusion. In hypothesis testing, errors in decision making can occur based on sample data, leading
to two types of errors regarding the acceptance or rejection of a null hypothesis.
Type I Error
 When the null hypothesis is true, but it is incorrectly rejected.
 The probability of making a type1 error is Denoted as 'α' (alpha) which is the level of significance we
set on our hypotheses.
 And 0.05 indicates that we are willing to accept a 5% chance that we are wrong when we reject the
null hypothesis
Scenario: Medical Test for a Disease
Type I Error (False Positive):
Situation: A medical test is designed to identify a specific disease.
Type I Error Definition: Declaring a healthy person as having the disease.
Example: The test indicates that a person is positive for the disease (rejecting the null hypothesis of being
healthy), but in reality, the person is disease-free.
Type II Error:
 When the null hypothesis is false, but it is incorrectly accepted
 The probability of making a type II error is Denoted as 'β' (beta) which depends on the power of the
test.
 The lower the chosen level of significance 'p' for rejecting the null hypothesis, the higher the
probability of Type II error.
 As 'p' decreases, the probability of Type II error increases. This is associated with a narrowing rejection
region and widening acceptance region.
Type II Error (False Negative):
Situation: Same medical test for the disease.
Type II Error Definition: Declaring a person with the disease as healthy.
Example: The test suggests that a person is negative for the disease (accepting the null hypothesis of being
healthy), but in reality, the person does have the disease.

Measurement of Test Goodness:


 The goodness of a statistical test is measured by the probabilities of Type I and Type II errors
 For a fixed sample size 'n', there is a trade-off between Type I (α) and Type II (β) errors; reducing one
tends to increase the other
 Increasing the sample size ('n') allows for a potential decrease in both α and β, but simultaneous
reductions in both α and β are not possible.

Q.3. Explain the meaning and concept of levels of significance Describe the steps in setting up the level of
significance
ANS: Levels of significance play a crucial role in research experiments, helping researchers determine whether
observed differences are meaningful or just due to chance. These levels serve as critical points on the
probability scale, separating significant from non-significant differences in statistics.

Significance Levels in Research: In social sciences research, the .05 and .01 levels of significance are commonly
used. These levels indicate the confidence with which researchers either reject or retain a null hypothesis. The
choice of significance level influences the rigor of the study, with .01 being a more stringent criterion compared
to .05.
Interpretation of Significance Levels:
.05 Level:
Amount of Confidence: 95%
Interpretation: If the experiment is repeated 100 times, the obtained mean will fall outside the range of µ ±
1.96 SE on only five occasions. In other words, there is a 95% confidence that the observed difference is not
due to chance.
.01 Level:
Amount of Confidence: 99%
Interpretation: If the experiment is repeated 100 times, the obtained mean will fall outside the range of µ ±
2.58 SE on only one occasion. This level is more rigorous, indicating a higher standard of confidence (99%) in
the observed difference.
Critical Values: The values 1.96 and 2.58 are derived from t tables, considering large samples. The .01 level is a
stricter criterion, requiring a larger critical ratio for null hypothesis rejection compared to the .05 level. Notably,
if a t value is significant at the .01 level, it automatically qualifies as significant at the .05 level, but the reverse
is not always true.

Conclusion: Understanding levels of significance is vital for researchers to make informed decisions about the
meaningfulness of their findings. The choice between .05 and .01 levels reflects the desired level of confidence,
with lower levels indicating a more stringent criterion for accepting research results. Researchers must carefully
consider these levels to draw valid conclusions and contribute to the reliability of scientific knowledge.

STEPS
1. State the null hypothesis and the alternative hypothesis
2. Set the criteria for a decision.
3. Level of significance or alpha level for the hypothesis test
4. Critical Region: This is the region which is composed of extreme sample values that are very unlikely
outcomes, if the null hypothesis is true. The boundaries for the critical region are determined by the alpha
level
5. Collect data and compute sample statistics using the formula given below

z= x- – μ
----------
σx
where
-
x = sample mean
μ = hypothesized population mean
σx = standard error between x and μ
6. Make a decision and write down the decision rule.
7. Z-Score is called a test statistics. The purpose of a test statistics is to determine whether the result of a
research study (the obtained difference) is more than what would be expected by the chance alone.

z = Obtained difference
---------------------------------
Difference due to chance

In practical situations, still other aspects are considered while accepting or rejecting a lot. The risks involved for
both producer and consumer are compared. Then TypeI andType-Il errors are fixed; and a decision is reached.

Q4.Describe the four major statistical techniques for organising the data.
Ans: There are four major statistical techniques for organising the data. These are: i) Classification ii) Tabulation
iii) Graphical Presentation, and iv) Diagrammatical Presentation

Classification:
 Arrangement of data in groups based on similarities.
 Organizes raw data to draw conclusions
 Provides a clearer picture of score information.
 Types:
o Ungrouped: Lists scores with tallies for frequency.
o Grouped: Organizes data into classes, showing the number of observations in each class.
 Relative Frequency Distribution: Indicates proportion of cases at each score value.
 Cumulative Frequency Distribution: Shows number of observations less than a particular value.
 Cumulative Relative Frequency Distribution: Expresses cumulative frequency as a proportion of the total
number of cases for each score or class interval.

TABULATION
Frequency distribution can be either in the form of a table or it can be in the form of graph. Tabulation is the
process of presenting the classified data in the form of a table. A tabular presentation of data becomes more
intelligible and fit for further statistical analysis. A table is a systematic arrangement of classified data in row
and columns with appropriate headings and sub-headings. The main components of a table are
 Table number: When there is more than one table in a particular analysis a table should be marked with a
number for their reference and identification
 Title of the table: Every table should have an appropriate title, which describes the content of the table
 Caption: Captions are brief and self-explanatory headings for columns.
 Stub: Stubs stand for brief and self-explanatory headings for rows
 Body of the table: This is the real table and contains numerical information or data in different cells.
 Head note: This is written at the extreme right hand below the title and explains the unit of the
measurements used in the body of the tables
 Footnote: This is a qualifying statement which is to be written below the table
 Source of data: The source from which data have been taken is to be mentioned at the end of the table.

Graphical Presentation of Data


The purpose of preparing a frequency distribution is to provide a systematic way of “ looking at” and
understanding data. In graphical presentation of frequency distribution, frequencies are plotted on a graph.
commonly used graphs are Histogram, Frequency polygon, Frequency curve, Cumulative frequency curve.
 HISTOGRAM- It is one of the most popular methods for presenting continuous frequency distribution in a
form of graph. In this type of distribution the upper limit of a class is the lower limit of the following class
 Frequency polygon: A frequency polygon is a line graph that shows the distribution of data by connecting
the midpoints of the tops of bars in a histogram, providing a visual representation of the data pattern.
 Frequency curve: A frequency curve is a smooth free hand curve drawn through frequency polygon. The
objective of smoothing of the frequency polygon is to eliminate as far as possible the random or erratic
fluctuations that are present in the data
 Cumulative Frequency Curve or Ogive: An ogive is a line graph that represents the cumulative frequency
of a dataset, displaying how many observations fall below a particular value or within a given range.

Diagrammatic Presentation of Data


 Bar diagram: A bar diagram, also known as a bar chart, is a visual representation of data using rectangular
bars or columns to represent different categories or groups. The length of each bar corresponds to the
quantity or frequency of the data it represents, making it easy to compare values across categories.
 Sub- divided bar diagram: A sub-divided bar diagram is a type of bar chart where each bar is further
divided into segments or sub-parts, representing different components or subcategories within a broader
category. This visualization helps illustrate the composition of each category, showing how it is divided into
various elements.
 Multiple Bar diagram: A multiple bar diagram is a type of bar chart that displays multiple sets of data side
by side, using separate bars for each dataset within a category. This visualization allows for the comparison
of values across different groups, making it easy to observe trends and variations in each category.
 Pie diagram: A pie diagram, also known as a pie chart, is a circular statistical graphic that is divided into
slices to illustrate numerical proportions. Each slice represents a proportionate part of the whole data set,
making it easy to visualize the distribution and relative sizes of different categories or components.

Q5.Explain point biserial correlation and phicoefficient with suitable exam +1 +1+1
ANS: Point biserial correlation is a special case of Pearsons correlation and examines the relationship between
a dichotomous variable and a continuous variable.
 Point Biserial Correlation (rpb) is Pearson’s Product moment correlation between one truly
dichotomous variable and other continuous variable. Algebraically, the rpb = r.

 One continues variable and quantitative ( which will take forever to count) (interval or ratio)
 One truly or naturally dichotomous variable (Qualitative/ binary variable)
 The dichotomous variable is the one that can be divided into two sharply distinguished or mutually
exclusive categories.
 A dichotomous variable is a variable with two values eg male female, smoker non smoker
 These are the truly dichotomous variables for which no underlying continuous distribution can be
assumed.
 We cant apply pearsons since it requires continues variables.
 We apply 1 for female representation and 0 for male representation. Or any binary representations to
the dichotomous values

Many different situations call for analysing a link between binary variable and continuous variable.
a) Does Drug A or Drug B works better for depression
b) How many students are passed or failed
c) Are women likely to earn more as nurses

Assumptions for Point-Biserial correlation


Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in
order for statistical method results to be accurate.
The assumptions for Point-Biserial correlation include:
 Continuous and Binary
 Normally Distributed
 No Outliers
 Equal Variances

Formula

The formula for the point biserial correlation coefficient is:

 M1 = mean (for the entire test) of the group that received the
positive binary variable (i.e. the “1”).
 M0 = mean (for the entire test) of the group that received the
negative binary variable (i.e. the “0”).
 Sn = standard deviation for the entire test.
 p = Proportion of cases in the “0” group.
 q = Proportion of cases in the “1” group.

PHI coefficient the Pearson's correlation between a dichotomous variable and a continuous variable is known
as the point-biserial correlation. When both variables are dichotomous, the calculated Pearson's correlation is
termed as Phi Coefficient (φ). For instance, consider correlating gender and property ownership. If both
variables have two levels (e.g., male/female and owns/does not own property), calculating the Pearson's
correlation results in the Phi Coefficient (φ), where both variables take values of 0 or 1.

Q.6 Explain the characteristics of normal probability curve +1 +1


ANS:
1. The Normal Curve is Symmetrical: The normal curve, also known as the bell curve, is symmetrical. This
means that if you draw a line in the middle of the curve, the two halves on either side will look like mirror
images of each other. So, the left side and the right side of the curve are identical in terms of shape, size,
and slope. This symmetry is what gives the normal curve its distinct and balanced appearance.
2. The Normal Curve is Unimodel: Since there is only one maximum point in the curve, thus the normal
probability curve is unimodel, i.e. it has only one mode
3. The Maximum Ordinate occurs at the Center: The maximum height of the ordinate always occur at the
central point of the curve, that is the mid-point. In the unit normal curve it is equal to 0.3989
4. The Normal Curve is Asymptotic to the X Axis: The normal probability curve gradually gets closer to the
horizontal axis as it extends away from the middle point, but it never actually touches the axis. This means
the curve goes on indefinitely in both directions, from negative infinity to positive infinity.
5. The Height of the Curve declines Symmetrically: In the normal probability curve the height declines
symmetrically in either direction from the maximum point
6. The Points of Influx occur at point ±1 Standard Deviation (± 1 σ): The normal curve transitions from being
curved outward (convex) to curved inward (concave) at a specific point called the point of influx. If we
draw perpendicular lines from these two points of influx to the horizontal X-axis, they touch the axis at a
distance of one standard deviation unit above and below the mean (the central point)
7. The Total Percentage of Area of the Normal Curve within Two Points of Influxation is fixed:
Approximately 68.26% area of the curve lies within the limits of ± 1 standard deviation (± 1 σ) unit from
the mean

Y=0.3989

34.13% 34.13%

Z=0

8 The Total Area under Normal Curve may be also considered 100 Percent Probability: The total area
under the normal curve may be considered to approach 100 percent probability; interpreted in terms
of standard deviations.
9 The Normal Curve is Bilateral: The 50% area of the curve lies to the left side of the maximum central
ordinate and 50% of the area lies to the right side. Hence the curve is bilateral.
10 The Normal Curve is a mathematical model in behavioural Sciences Specially in Mental
Measurement: This curve is used as a measurement scale. The measurement unit of this scale is ± 1σ
(the unit standard deviation).

Q.7 Describe the levels of measurement with suitable examples


ANS:
When choosing a statistical test, it's crucial to consider the level of measurement associated with the
dependent variable. Parametric tests usually require at least interval-level measurement, while non-parametric
techniques can be applied to all levels of measurement and are commonly used with nominal and ordinal level
data.
 Nominal Data: Nominal data is the first level of measurement, and it falls under categorical information.
Nominal scales typically consist of two distinct and exclusive categories with no inherent order, such as yes
or no, male or female. In nominal measurement, data is sorted into these categories, and the frequency of
numbers in each category is counted. Importantly, there are no numerical values assigned to the variables
in nominal measurement, and no meaningful numerical distances or ordering exist. As a result, achieving a
'normal distribution' of the dependent variable is not possible.

In descriptive research within the health sciences, nominal scales are frequently used when gathering
demographic data about target populations, where variables like the presence or absence of pain or agreement
and disagreement are considered.
Example of Nominal Level Measurement:
Does your back problem affect your employment status?
Yes No
Are you limited in how many minutes you are able to walk continuously with or without support (i.e., cane)?
Yes No
Ordinal data
Ordinal data is the second level of measurement, often linked with non-parametric statistics. It provides a
qualitative 'order' of variables in exclusive categories, but it doesn't indicate the value of differences between
positions. For example, in pain scales or stress scales, you can say someone with a score of 5 experiences more
pain than someone with a score of 3, but you can't measure the exact difference. Non-parametric techniques
are used for testing differences between groups and relationships among variables in ordinal data.

Example:
Should there be a war between isreal and palastine
Strongly Agree Agree Neutral Disagree Strongly disagree

Interval and Ratio Data


Interval level data is a minimum requirement for parametric techniques. Like ordinal data, it's ordered into
exclusive categories, but here, the divisions between categories are equal. The only difference between interval
and ratio data is the presence of a meaningful zero point. In interval data, zero doesn't mean the absence of
value, so you can't say one point is two times larger than another. For instance, 100 degrees Celsius isn't two
times hotter than 50 degrees because zero doesn't represent the complete absence of heat.
Ratio data, the highest level of measurement, has equal intervals between variables and a meaningful zero
point.

Examples include weight, blood pressure, and force. In health science research, multi-item scales are often
used, where individual items can be either nominal or ordinal.

Q8. Elucidate interactional effect. Discuss the merits and demerits of two-way ANOVA
ANS: Interaction or interactional effect refers to the combined impact of two or more independent variables on
a dependent variable. In the context of two-way ANOVA, the consideration and interpretation of these
interaction effects become crucial.
Without considering interaction, the analysis of two-way or three-way ANOVA loses its significance. Interaction
effects help understand how the joint influence of multiple variables affects the dependent variable.
An analogy with the use of two types of fertilizers (Urea and Phosphate) illustrates the concept. While each
fertilizer independently affects crop growth, their combined application in the right ratio may lead to a
significant increase or decrease in crop growth.
Similarly, in psychology and education, two different teaching methods (Treatment A and Treatment B) may
individually impact academic achievement. However, their combined effect (interaction) needs to be
considered to determine whether the impact is enhanced, diminished, or nullified.

the mean values of Extroverts and Introverts having high and low level of anxiety
GROUPS EXTROVERTS M1 INTROVERTS M2 TOTAL MEAN
HIGH ANXIETY 13.60 15.40 14.50
LOW ANXIETY 14.20 12.20 13.20
TOTAL MEAN 13.90 13.80 13.85

MERITS AND DEMERITS OF TWO WAY ANOVA


Merits of Two Way Analysis of Variance
 The following are the advantages of two way analysis of variance
 This technique is used to analyse two types of effects viz. main effects and Interaction Effects.
 More than two factors effects are analysed by this technique.
 For analysing the data obtained on the basis of factorial designs, this technique is used.
 This technique is used to analyse the data for complex experimental studies.

Demerits or Limitations of Two Way ANOVA


 More Than Two Categories: It works well when you're comparing two things, but if you have more than
two groups, it becomes tricky. The F ratio it gives us shows a general difference, but to figure out which
groups are really different, you might need additional tests like the 't' test.
 Assumptions Needed: It relies on certain assumptions, like one-way ANOVA. If these assumptions aren't
met, the results might not be reliable or accurate.
 Complex and Time-Consuming: It's not a quick and easy method. Analyzing data with two-way ANOVA
takes time and can be challenging.
 Increased Complexity with More Factors: If you're studying many things at once (factors), it gets even
more complicated. Understanding and explaining the results become harder.
 Requires Advanced Skills: Using two-way ANOVA demands strong math and logic skills. You need to be
good at calculations and interpreting results creatively and logically.

The following limitations are found in this technique:


 When there are more than two classification of a factor or factors of study. F ratio value provides global
picture of difference among the main treatment effects. The inference can be specified by using ‘t’ test in
case when F ratio is found significant for a treatment.
 This technique also follows the assumptions on which one way analysis of variance is based. If these
assumptions are not fulfilled, the use of this technique may give us spurious results.
 This technique is difficult and time consuming.
 As the number of factors are increased in a study, the complexity of analysis in increased and
interpretation of results become difficult.
 This technique requires high level arithmetical and calculative ability. Similarly it also requires high level of
imaginative and logical ability to interpret the obtained results.

Q9. What is Skewness ? Explain the factors causing divergence in normal distribution
ANS: Skewnes: A distribution is said to be “skewed” when the mean and median fall at different points in the
distribution and the balance i.e. the point of center of gravity is shifted to one side or the other to left or right.
In a normal distribution the mean equals, the median exactly and the skewness is of course zero (SK = 0).
There are two types of skewness which appear in the normal curve.
a) Negative Skewness : Distribution said to be skewed negatively or to the left when scores are massed at the
high end of the scale, i.e. the right side of the curve are spread out more gradually toward the low end i.e.
the left side of the curve. In negatively skewed distribution the value of median will be higher than that of
the value of the mean.

NEGETIVE SKEWNESS

Positive Skewness: Distributions are skewed positively or to the right, when scores are massed at the low; i.e.
the left end of the scale, and are spread out gradually toward the high or right end
POSITIVE SKEWNESS

FACTORS CAUSING THE DIVERGENCE IN THE NORMAL DISTRIBUTION /NORMAL CURVE

The reasons on why distribution exhibit skewness and kurtosis are numerous and often complex, but a careful
analysis of the data will often permit the common causes of asymmetry. Some of common causes are
 Selection of the Sample: The choice of subjects can introduce skewness and kurtosis in score
distributions. Small or biased samples may result in skewness. Additionally, scores from small and
homogeneous groups tend to produce narrow and leptokurtic distributions, while scores from small
and highly heterogeneous groups lead to platykurtic distributions.
 Unsuitable or Poorly Made Tests: Inadequate or poorly designed tests can lead to asymmetry in score
distributions. If a test is too easy, scores tend to cluster at the high end, while if the test is excessively
difficult, scores accumulate at the low end of the scale.
 The Trait being Measured is Non-Normal: Skewness or Kurtosis or both will appear when there is a
real lack of normality in the trait being measured, e.g. interest, attitude, suggestibility, deaths in a old
age or early childhood due to certain degenerative diseases
 Errors in the Construction and Administration of Tests: A poorly constructed test with inadequate
item analysis can result in score distribution asymmetry. Additionally, unclear instructions during test
administration, errors in timing, scoring mistakes, and variations in practice and motivation can
contribute to skewness in the score distribution.

Q.7 Discuss the importance and application of standard error of mean.


Ans: The Versatility of Standard Error in Inferential Statistics
The standard error of statistics plays a pivotal role in inferential statistics, offering concrete insights that guide
researchers toward more definitive conclusions. Its various applications enhance the reliability of sample
measurements and facilitate efficient estimation of population parameters.
 Reliability Assessment:
o Explanation: The standard error helps us figure out how trustworthy our sample is when
we're studying a big group. By calculating reliability, we make sure our sample is dependable
and represents the whole bunch accurately.
 Estimation of Population Parameters:
o Explanation: The main job of the standard error is to guess things about the whole
population based on our sample. Since we can't measure everyone, the standard error gives
us a range where we think the real answers are, making our guesses stronger.
 Feasibility in Research Work:
o Explanation: Sometimes it's impossible to study everyone in a big group. The standard error
helps us make good guesses about the whole group without having to measure everyone.
This saves time, energy, and money in our research.
 Sample Size Determination:
o Explanation: The standard error helps us figure out how many people or things we need to
study to get reliable results. It ensures we don't study too few or too many, just the right
amount for good and meaningful outcomes.
 Significance of Group Differences:
o Explanation: When comparing two groups, the standard error helps us see if the differences
we observe are real or just by chance. By removing errors from our measurements, it ensures
that the differences we find are actually important, making our comparisons more
trustworthy.
 In summary, the standard error is like a helpful tool for researchers. It makes sure our samples are
reliable, helps us estimate things about the whole population, makes research more practical, guides
us in choosing the right sample size, and ensures that the differences we find between groups are
meaningful and not just random. It's a super important part of doing good research!

Reliability Assessment:
 Standard error aids in determining the reliability of a sample from a large population. Calculating the
reliability of statistics is straightforward and contributes to the overall dependability of the sample.
Estimation of Population Parameters:
 The primary objective of the standard error is to estimate population parameters. As no sampling
device guarantees perfect representativeness, the standard error formula establishes parameter limits
within a prefixed confidence interval, providing a more robust foundation for inference.
Feasibility in Research Work:
 In situations where the population is unknown or impractical to measure, the standard error makes
estimating population parameters feasible. This not only streamlines the research process but also
proves economical in terms of time, energy, and financial resources.
Sample Size Determination:
 The standard error is instrumental in determining the appropriate size for experimental or survey
studies. This application ensures that the selected sample size is sufficient for achieving meaningful
and reliable results.
Significance of Group Differences:
 Another valuable application lies in assessing the significance of differences between two groups. By
eliminating sampling or chance errors, the standard error aids in accurately determining the true
significance of observed differences, contributing to the credibility of comparative analyses.
In summary, the standard error of statistics emerges as a versatile tool, not only ensuring the reliability of
samples but also facilitating efficient and economical research practices. Its applications span from estimating
population parameters to guiding sample size decisions and enhancing the precision of comparative analyses,
making it an indispensable component of inferential statistical methodologies.

Q8. Explain step by step procedure for computation of Kruskal Wallis ANOVA with an
example
Ans:
 Kruskal wallis test is also known as H test or one way Anova on ranks.
 The Kruskal Wallis test is a non-parametric analogue to ANOVA
 The Kruskal-Wallis test compares the medians of several (more than two) populations to see
whether they are all the same or not
 It is a non parametric test. Which means the observations are not normally distributed.
 The sample collected is through randomization
 The data are in a rank-order format, since it is the only format in which scores are available
 The data have been transformed into a rank-order format from an interval/ratio format.
 The k samples are independent of one another
 The dependent variable (which is subsequently ranked) is a continuous random variable
 The underlying distributions from which the samples are derived are identical in shape
 We compare more then two population or one independent variable with more then 2 groups or
levels

STEP BY STEP PROCEDURE FOR KRUSKAL WALLIS ANOVA


The null and alternative hypotheses may be stated as:
H0: the population medians are equal H1: the population medians differ

1) Rank all the numbers in the entire data set from smallest to largest (using all samples combined);
in the case of ties, use the average of the ranks that the values would have normally been given.

2) Total the ranks for each of the samples; call those totals T1, T2, . . ., Tk, where k is the number of
groups or populations.

3) Calculate the Kruskal-Wallis test statistic,


H = [ 12 / N (N+1) ] [ Σ((ΣR)2 / n) ] – 3(N + 1)
N = the total number of cases
n = the number of cases in a given group
(ΣR)2 = the sum of the ranks squared for a given group of subjects
4) Find the p-value.
5) Make your conclusion about whether you can reject Ho by examining the p-value.

Examples:
If you want to find out how exam anxiety affects exams scores

 Exams anxiety is the Independent variable


o No anxiety
o Low medium anxiety
o High anxiety
 Exam scores will be dependent variable
In this example we do between group analyses to see the effect of anxiety on the exam scores

Q9. Define and describe coefficient of correlation. Discuss the characteristics and measures
of correlation
Ans: Correlation refers to a process for establishing whether or not relationships exist between two given
variables. So, through this coefficient, one can get a general idea about whether or not two variables are
related. There are many measures are available for variables which are measured at the ordinal or higher
level of measurement. But still, correlation is the most commonly used approach.
To calculate and interpret correlation coefficients for ordinal and interval level scales.

Methods of correlation will summarize the relationship between two variables in a single number
known as the correlation coefficient. The correlation coefficient is usually shown by the symbol r and
it ranges from -1 to +1.

Correlation Correlation type Meaning


coefficient value

1 Perfect positive When one variable changes, the other variables


correlation change in the same direction.

0 Zero correlation There is no relationship between the variables.

-1 Perfect negative When one variable changes, the other variables


correlation change in the opposite direction.
Correlation coefficients summarize data and help you compare results between studies.
After data collection, you can visualize your data with a scatterplot by plotting one variable on the x-axis and
the other on the y-axis. It doesn’t matter which variable you place on either axis.

Visually inspect your plot for a pattern and decide whether there is a linear or non-linear pattern between
variables. A linear pattern means you can fit a straight line of best fit between the data points, while a non-
linear or curvilinear pattern can take all sorts of different shapes, such as a U-shape or a line with a curve.

If all points are close to this line, the absolute value of your correlation coefficient is high.

If these points are spread far from this line, the absolute value of your correlation coefficient is low.

Note that the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation
coefficient doesn’t help you predict how much one variable will change based on a given change in the other,
because two datasets with the same correlation coefficient value can have lines with very different slopes.

Characteristics of Correlation
 Correlation Direction:
o Negative Correlation: Coefficient is negative, indicating that as one variable increases,
the other decreases predictably.
o Positive Correlation: Coefficient is positive, meaning both variables tend to move in the
same direction.
 Correlation Coefficient Range:
o Coefficients range from -1.00 to +1.00.
o -1.00: Perfect negative relationship (inverse movement).
o +1.00: Perfect positive relationship (similar movement).
o 0.00: No correlation, unpredictable relationship between variables.
 Strength of Relationships:
o Larger coefficients indicate stronger relationships.
o Closer to 0.00: Weaker relationship, less predictability.
o Closer to ±1.00: Stronger relationship, more accurate predictability.

MEASURES OF CORRELATION
 Parametric Statistics
o Pearson product moment correlation coefficient (Most widely accepted as a single
appropriate statistics for correlation
 Non-parametric Statistics
o Spearman’s rank order correlation coefficient: “Spearman Rho”
o Kendall’s Tau:
o Chi Square

Q.9 Describe the importance and application of normal distribution.


Ans: About Normal Distribution:
 Normal distribution is like a useful tool in science.
 It helps us understand how things like the number of boys and girls born or people's heights
and weights are spread out over time.
 It's also handy for studying things like birth rates, marriages, and deaths, as well as
psychological stuff like intelligence and reaction times.
 In science classes like physics and chemistry, it's used to handle mistakes or errors.
In Education:
 In school, normal distribution is important when we test how smart or capable students are.
 Think of it like a model or a blueprint, not the actual scores. It gives us an idea of how scores
might be arranged, but it's not always a perfect match.

APPLICATIONS/USES OF NORMAL DISTRIBUTION CURVE


There are number of applications of normal curve in the field of psychology as well as educational
measurement and evaluation. These are:
i) To determine the percentage of cases (in a normal distribution) within given limits or scores.
ii) To determine the percentage of cases that are above or below a given score or reference
point.
iii) To determine the limits of scores which include a given percentage of cases to determine
the percentile rank of an individual or a student in his own group.
iv) To find out the percentile value of an individual on the basis of his percentile rank.
v) Dividing a group into sub-groups according to certain ability and assigning the grades.
vi) To compare the two distributions in terms of overlapping.
vii) To determine the relative difficulty of test items.
viii) Compare two or more distributions with it.i.e. to say whether the distribution is normal or
not and if not , in what way it diverges from the normal.
ix) Converting raw scores into comparable standard normalized scores.
x) To determine the relative difficulty of test items

Q.9 Discuss the applications of normal distribution curve. Describe divergence from
normality with suitable diagram and discuss the factors causing divergence
ANS: Divergence from Normality:
Type # 1. Skewness:
A distribution is normal when the Mean, Median and Mode coin side together and there is a perfect
balance between the right and left halves of the figure. But when the Mean, Median and Mode fall at
different points in the distribution, and the center of gravity is shifted to one side it is said to be
skewed. In a normal distribution the mean equals the Median-Means.
Mean—Median = 0. So the skewness is ‘0’.

skewness is “a distribution not having equal probabilities above and below the mean.” So in fact
greater the gap between the mean and the median, greater is the skewness.
When in a distribution the scores are massed at the high end of the scale i.e. to the right end and are
spread out more gradually towards the left side at that time the distribution is said to be Negatively
Skewed.

In a negatively skewed distribution the Median is greater than the Mean. So when the skewness is
negative the mean lies to the left of the Median. Similarly when in a distribution the scores are
massed at the low end of the scale i.e. to the left end and are spread out more gradually to the right
side at that time the distribution is said to be Positively Skewed.

In a positively skewed distribution the Median is less than the mean. So when the skewness is
positive the mean lies to the right of the Median.

Divergence from Normality: Type # 2. Kurtosis:


Kurtosis means the ‘peakedness’ or flatness of a frequency distribution compared to the normal
distribution. The collins Dictionary of Statistics defines kurtosis as “the sharpness of a peak on a
curve of a probability density function”
The Normal Probability Curve is moderately peaked. If any frequency curve is more peaked or flatter
than the NPC we can say the distribution diverges from normality. Kurtosis is a measure of such
divergence.
There are three types of Kurtosis such as:
1. Leptokurtic
2. Mesokurtic
3. Platykurtic
When the frequency distribution is more peaked at the center then the Normal Curve is called as
Leptokurtic. The value of kurtosis of a leptokurtic curve is greater than 0.263.
When the frequency distribution is normally distributed the curve is Mesokurtic. The kurtosis of a
Normal curve is 0.263.
When a frequency distribution is flater than the normal curve it is called as Playkurtic. The value of
kurtosis of a platykurtic curve is less than 0.263.

Q10. CONCEPT OF NORMAL DISTRIBUTION CURVE

ANS:
 The normal distribution curve, also known as the Gaussian distribution or bell curve, is a
statistical concept that describes the distribution of a continuous random variable.
 The normal distribution curve is symmetric and forms a bell-shaped curve when graphed.
The highest point of the curve, known as the mean, is at the center, and the curve extends
equally in both directions.
Central Limit Theorem:
 The normal distribution is a fundamental concept in statistics, particularly due to the Central
Limit Theorem. This theorem states that the distribution of the sum (or average) of a large
number of independent, identically distributed random variables approaches a normal
distribution, regardless of the original distribution.
Parameters:
 The normal distribution is defined by two parameters: the mean (μ), which locates the
center of the curve, and the standard deviation (σ), which measures the spread or dispersion
of the data. The curve is more peaked and narrow with a smaller standard deviation and
flatter and wider with a larger standard deviation.
68-95-99.7 Rule:
 A significant feature of the normal distribution is the empirical rule or the 68-95-99.7 rule.
This rule states that approximately 68% of the data falls within one standard deviation of the
mean, about 95% falls within two standard deviations, and nearly 99.7% falls within three
standard deviations.
Z-Score:
 The Z-score, or standard score, is used to quantify the number of standard deviations a data
point is from the mean in a normal distribution. It helps in understanding the relative
position of a data point within the distribution.

Q11.Discuss Point and Interval estimation


Ans: Point Estimation: When we want to guess the average score (μ) for an entire group based on a
sample, we use the score we got from the sample (x) as an estimate. For example, if we got an
average score (x) of 45.0 from 100 students, we say that's our estimate for the average score of the
whole group.

Point estimation is a method used in statistics to make an educated guess, or estimate, about a
population parameter based on a sample statistic. In simpler terms, if we want to know something
about a whole group (like the average height of all students), but it's impractical to measure
everyone, we can take a sample and use the average height from that sample as our best guess for
the entire group's average. The value we get from the sample is a single point estimate. It's like
saying, "Okay, from our sample, we think the average for the whole group is probably around this
number."

Interval Estimation: Now, our estimate might not be exactly right because samples can be a bit off.
To be more confident about how close our estimate is, we create something called a "confidence
interval." This is like a range of scores we think the real average (μ) could be. We have lower and
upper limits in this range, and we can say, for example, we are 95% sure the actual average falls
within this range. It helps us have more confidence in our estimate.
Now, here's the thing about point estimates - they're not perfect. There's always a chance they're a
bit off. To account for this uncertainty, we use interval estimation. Instead of just giving one number
as our estimate, we create a range, or interval, of values that we're pretty sure contains the true
value. This range is called a confidence interval. So, if we say the average height from our sample is
160 cm with a 95% confidence interval, it means we're pretty confident that the true average height
for the entire group falls somewhere between, say, 155 cm and 165 cm. This gives us a sense of how
much we can trust our estimate and helps us communicate the uncertainty associated with our
guess.

Q.12. Discuss in detail partial correlation with suitable example


ANS: In statistics, when we want to understand the relationship between two variables, A and B,
while taking into account the influence of another variable, C, we use something called partial
correlation. This means we're looking at the correlation between A and B after we've "controlled for"
or removed the influence of C.
So, if we say "AB|C. r," it's read as the correlation between A and B after the influence of C has been
taken out. In simpler terms, it's like figuring out how A and B are related while considering the
impact of C. This partial correlation helps us focus specifically on the connection between A and B
without being skewed by the effects of C. We can extend this idea to control for more variables if
needed

For example, the researcher is interested in computing the correlation between anxiety and
academic achievement controlled from intelligence. Then correlation between academic
achievement (X) and anxiety (Y) will be controlled for Intelligence (Z). This can be represented as:
rAcademic Achievement(A) Anxiety (B) . Intelligence (C) .
To calculate the partial correlation (rP ) we will need a data on all three variables. The computational
formula is as follows:
Q13. Describe the basic assumptions in testing of significance of difference between two
sample means
ANS:
1. The variable or the trait being measured or studied is normally distributed in the universe.
2. There is no difference in the Means of the two or more populations. i.e. My=Mz If there is a
violation or deviation in the above assumptions in testing the significant difference in the two
means, we can not use “C.R” or “t” test of significance. In such condition, there are other
methods, which are used for the purpose.
3. The samples are drawn from the population using random method of sample selection.
4. The size of the sample drawn from the population is relatively large

Q14. Differentiate between partial and part correlation


Ans: In statistics, the semi-partial correlation (or part correlation) denoted as rsp is a measure of the
correlation between two variables (A and B), with the influence of a third variable (C) being removed
from one of the variables.
In partial correlation (rp=rAB.C), the effect of the third variable (C) is removed from both variables (A
and B).
In semi-partial correlation rsp=rA(B.C)), the effect of the third variable (C) is removed only from one
variable (B) and not from both.

For example, consider the correlation between anxiety (A) and academic achievement (B), where
intelligence (C) is a third variable. In partial correlation, we remove the influence of intelligence from
both anxiety and academic achievement. In semi-partial correlation, we only remove the influence of
intelligence from academic achievement.

The formula to compute the semi-partial correlation coefficient is given by:


This formula allows us to quantify the specific correlation between A and B while considering the
influence of C on B.

Q.15 Discuss the step by step procedure for Kendall's • Rank Order Correlation
Ans: The steps for using the Kendall rank order correlation coefficient (τ) are as follows:
 Ranking:
o Rank the observations on the X variable from 1 to N.
o Rank the observations on the Y variable from 1 to N.
 Ordering:
o Arrange the list of N subjects so that the ranks of the subjects on variable X are in
their natural order (1, 2, 3, … N).
o Observe the Y ranks in the order in which they occur when X ranks are in their
natural order.
 Calculate S:
o Determine the value of S, which is the number of agreements in order minus the
number of disagreements in order for the observed order of the Y ranks.
 Use of Formula:
o If there are no ties in either X or Y observations, use the formula
o T=2S/(N(N-1)) ,
 where S is the score of agreement minus the score of disagreement.
 N is the number of objects or individual ranked on both X and Y
 If there are ties, modify the formula to account for ties on variables.
 T= 2S / [ N (N-1) – Tx T= 2S / [ N (N-1) – Ty
 Where: S and N are as above
 Tx = Σ t (t – 1), t being the number of tied observations in each group of the ties on the X
variable
 T y = Σ t (t – 1), t being the number of tied observation in each group of the ties on the Y
variable
 If the N subjects constitute a random sample from some population, one may test the
hypothesis that the variable X and Y are independent in that population. The method for
doing so depends on the size of N:
 For N ≤ 10, Table — Upper tail probabilities for T, the Kendall rank order correlation
coefficient
 For N > 10, but less than 30, Table – Critical value for T, the Kendall rank order correlation
coefficient
 For N < 30 (or for intermediate significance levels for 10 < N ≤ 30) compute the value of z
associated with T by using formula given below and use the z table
 z = 3T N (N – 1) / 2 (2N+5)
 If the probability yielded by the appropriate method is equal to or less than the critical
value, null hypothesis may be rejected in the favour of alternative hypothesis.

Q16. Discuss frequency distribution in terms of grouped and ungrouped data. Elucidate the
types of frequency distribution
Ans: An ungrouped frequency distribution may be constructed by listing all score values either from
highest to lowest or lowest to highest and placing a tally mark (/) besides each scores every times it
occurs. The frequency of occurrence of each score is denoted by ‘f
 Grouped frequency distribution: If there is a wide range of score value in the data, then it is
difficult to get a clear picture of such series of data. In this case grouped frequency
distribution should be constructed to have a clear picture of the data. A group frequency
distribution is a table that organises data into classes
 It shows the number of observations from the data set that fall into each of the class.
 There are three methods for describing the class limits for distribution: (i) Exclusive method,
(ii) Inclusive method and (iii) True or actual class method.
o Exclusive method In this method of class formation, the classes are so formed that
the upper limit of one class become the lower limit of the next class. In this
classification, it is presumed that score equal to the upper limit of the class is
exclusive, i.e., a score of 40 will be included in the class of 40 to 50 and not in a class
of 30 to 40 (30-40, 40-50, 50-60)
o Inclusive method In this method the classes are so formed that the upper limit of
one class does not become the lower limit of the next class. This classification
includes scores, which are equal to the upper limit of the class. Inclusive method is
preferred when measurements are given in whole numbers. (30-39, 40-49, 50-59)
o True or Actual class method Mathematically, a score is an internal when it extends
from 0.5 units below to 0.5 units above the face value of the score on a continuum.
These class limits are known as true or actual class limits. (29.5 to 39.5, 39.5 to 49.5)
etc
Types of Frequency Distribution
 There are various ways to arrange frequencies of a data array based on the requirement of
the statistical analysis or the study. A couple of them are discussed below:
 i) Relative frequency distribution: A relative frequency distribution is a distribution that
indicates the proportion of the total number of cases observed at each score value or
internal of score values.
 ii) Cumulative frequency distribution: Sometimes investigator may be interested to know the
number of observations less than a particular value. This is possible by computing the
cumulative frequency. A cumulative frequency corresponding to a class-interval is the sum of
frequencies for that class and of all classes prior to that class.
 iii) Cumulative relative frequency distribution: A cumulative relative frequency distribution is
one in which the entry of any score of class interval expresses that score’s cumulative
frequency as a proportion of the total number of cases

1. Explain linear and nonlinear relationship with suitable diagrams. Discuss the steps in
computing - Pearson's product moment correlation.
Ans: Linear Relationship One of the basic forms of relationship is linear relationship. Linear
relationship can be expressed as a relationship between two variables that can be plotted as a
straight line. The linear relationship can be expressed in the following equation
Y=á+âX
Y is a dependent variable (variable on y-axis),
á (alpha) is a constant or Y intercept of straight line,
â (beta) is slope of the line and
X is independent variable (variable on x-axis).
Non-linear relationships, such as the Yorkes-Dodson Law and Steven's Power Law, deviate from
straight-line patterns. The Yorkes-Dodson Law highlights the non-linear link between stress and
performance, indicating poor performance at low and high stress levels, but improvement at
moderate stress. Curvilinear relationships, encompassing various types like cubic, quadratic, and
exponential, cannot be represented as straight lines. While this discussion primarily addresses linear
relationships, it's crucial to recognize the existence of diverse relationship types. Stevens' Power Law
exemplifies a non-linear connection between sensation (r) and stimulus (s), expressed as r = csb, with
the potential for conversion into a linear equation through logarithmic transformations

You might also like