0% found this document useful (0 votes)
10 views

Mathematics (5)

This study investigates the correlation between study hours and academic performance in Mathematics among students in grades 5 to 8 at BNKS. Through a survey and the application of statistical tools like Pearson and Spearman correlation coefficients, the research aims to determine if increased study time correlates with higher marks. The findings suggest a very weak linear relationship between study hours and marks, with a Pearson coefficient of -0.11, indicating minimal correlation.

Uploaded by

Adish Uprety
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Mathematics (5)

This study investigates the correlation between study hours and academic performance in Mathematics among students in grades 5 to 8 at BNKS. Through a survey and the application of statistical tools like Pearson and Spearman correlation coefficients, the research aims to determine if increased study time correlates with higher marks. The findings suggest a very weak linear relationship between study hours and marks, with a Pearson coefficient of -0.11, indicating minimal correlation.

Uploaded by

Adish Uprety
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Correlation Between Study Hours and Marks: A

BNKS Case Study


Adish Uprety

March 8, 2025

Abstract
In the typical sense, the amount of hours you put in a subject and the marks you obtain
should correlate with each other in a positive way. Greater the hours, the higher the marks
distribution. In order to investigate if typicality supports the reality, this paper presents the
outcome of a survey carried out between students of 5 to 8 in BNKS, where they were in-
quired about their study hours and the marks they get in Mathematics. Using appropriate
Mathematical tools along the lines of Correlation, Regression, Pearson Correlation Coefficient,
Spearman’s Coefficient and null hypothesis, This paper presnts the outcome of the survey with
Mathematical Background.

1 Background
The academic performance of students is a central concern in the field of education, with profound
implications for their future success and societal well-being. A research study by Ibrahim Baba
Suleiman, ’Key factors influencing students’ academic performance’, shows that marks obtained
by students and the amount of hours they put in, are directly proportional. However, things are
different on the other side of the spectrum. Those that are at the top in any subject, say, Math-
ematics, already seem to know everything there is within the scope of their course, which leads
them to dedicate a very few amount of time in the subject.

These two aspects are perfectly justified, and hence, to analyze this situation, I carried out a
survey of students from Grade 5 to 8 and collected appropriate data. Using mathematical tools
along the lines of regression, correlation coefficients, Pearson and Spearman’s Coefficient, I was
able to deduce a Mathematical Output of the situation, and make sense of the trend inside BNKS.

1.1 Objectives of the survey


The objectives of the survey and the research were as:
1. Find if, Students at the top of the marks distribution, really did put in greater time into the
subject or not.
2. Find if, inadequate hours really is the reason behind weaker grades of students.
3. Find if, consistency plays an important role in the grades students secure.
These were the goals of this research determined prehand before the survey.

1
1.2 Survey Questions
To have a general grasp of the nature of questions students were asked, below are the exact
questionnaire (in order) that was asked to the students.

1. Can you please state your name and your roll number?

2. Can you state the marks you obtained in the latest terminal exam in Mathematics subject?

3. Disregarding the time you spend reading Mathematics in the classroom, How
much time would you say you spend on average, in a day on Mathematics?
Include the time you spend on your prep and quiet as well as any additional time
you open and read your Mathematics Book.

4. How many days would you say you study Mathematics in a week on average? If its hard to
find an estimate, maybe try evaluating yourself this week, and count the number of days you
have spent reading Mathematics.

These questions were the exact questions asked verbally to the students as a source of raw data.

2 Methodology of Data collections


To ensure, removal of any biases that might occur during the process of Data collection, I adapted
the following strategies:
1. Collected my data from 5 different houses. (P,N,D,M,J)

2. Collected my data from both boys and girls proportionately.

3. Collected data from various categories of students ranging from ’weak’ to ’toppers’.
Below represents the Variation of data collection sample space:

Figure 1: Bar graph showing Variation of sample space of data collections

2
All of the strategies adopted for data collection, are done to remove and avoid any biasness that
could have arose during data collection. Also note that the data following a pattern of nor-
mal distribution is enough proof that the data isnt biased atleast on the fundamental
level. (For the raw data refer to Appendix-1)

3 The Mathematical Tools


Our main aim should be to explore the relation between Marks obtained by students in a particular
subject and the amount of hours they put in the subject.
A good tool to use would be correlation and regression, as they are essentially a statistical
measure that expresses the extent to which two variables are linearly related. For this purpose we
will introduce two statistical coefficients:

3.1 Pearson correlation coefficient


In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures
linear correlation between two sets of data. It is the ratio between the covariance of two variables
and the product of their standard deviations.

Its value for any data distribution lies between -1 and 1, where -1 implies high negative core-
lation (inverse proportional relation) and 1 implies high positive corelation (direct proportional
relation). It is calulcated by the following formula:

Cov(x, y)
ρ=
σx σy
where, ρ is the Pearson Coefficient , Cov(x, y) is the covarience Between x,y, and σx and σy are
the standard deviations of respective data classes. It is commonly written in the form of:
P
(x − x)(y − y)
ρ = pP pP
(x − x)2 (y − y)2
whereby, x and y represent the mean of the respective data distribution.

3.2 Spearman’s rank correlation coefficient


In statistics, Spearman’s Rank Correlation Coefficient assesses how well the relationship between
two variables can be described using a monotonic function. While Pearson’s correlation assesses
linear relationships, Spearman’s correlation assesses monotonic relationships (whether linear or
not), and Hence, it is a good tool to introduce. it is calculated as:

6 d2
P
rs = 1 −
n(n2 − 1)
where n is the number of data and d = R(x) − R(y) which is the difference between rank of a data
and its corresponding y value. Note that:

−1 ≤ rs ≤ 1

and -1 represents strong inverse corelation while 1 represents strond direct corelation.

3
Introducing these two coefficients in our research would be a good way to estimate the corelation
of data.Both linear and monotonous forms of data will be analyzed in this way.

3.3 Regression analysis


As per wikipedia, In statistical modeling, regression analysis is a set of statistical processes for
estimating the relationships between a dependent variable (often called the outcome or response
variable, or a label in machine learning parlance) and one or more error-free independent vari-
ables (often called regressors, predictors, covariates, explanatory variables or features). It also
shows us a rough estimate on how data are corelated to each other. If the slope of the regression
line is positive, the data are positively correlated and the same remains true for the appositive case.

The slope could be calculated as:


P P P
n xy − x y
m=
n y − ( y)2
P P

3.4 Null Hypothesis


The null hypothesis can also be described as the hypothesis in which no relationship exists between
two sets of data or variables being analyzed. If the null hypothesis is true, any experimentally
observed effect is due to chance alone, hence the term ”null”. The greater the
These three tools : Pearson Corelation coefficient, Spearman’s rank correlation,
and Regression analysis will be our base for evaluating the relation between Marks
and Time dedicated in our Sample Data

4
4 Mathematical Analysis of Data
Using the afore mentioned statistical measures, we can easily estimate the correlation between
marks and time allocated. Using methods that allow us to explore both linear (Pearson) and
monotonic (Spearman) relations, we could explore the trend of data.

4.1 Pearson’s Coefficient of Correlation


For this we have to calculate the mean marks as well as mean time allocated and find how spread
data are from the mean.We also require to find the standard deviation.For this the data is to be
tabulated in a distribution table for Pearson’s Coefficient. The distribution table is as follows:

Roll Marks[X] Time allocated (mins)[Y] X −x (X − x)2 Y −y (Y − y)2


1085 11 45 -26.29 691.1641 -19.2 368.64
1118 12 120 -25.29 639.5841 55.8 3113.64
2107 18 45 -19.29 372.1041 -19.2 368.64
3081 19 120 -18.29 334.5241 55.8 3113.64
1010 20 30 -17.29 298.9441 -34.2 1169.64
1078 21 30 -16.29 265.3641 -34.2 1169.64
1016 21 30 -16.29 265.3641 -34.2 1169.64
1137 21 120 -16.29 265.3641 4.5 3113.64
1014 25 30 -12.29 151.0441 -34.2 1169.64
1021 27 120 -10.29 105.2841 55.8 3113.64
1001 30 150 -7.29 53.1441 85.8 7361.64
1136 30 120 -7.29 53.1441 55.8 3113.64
1115 31 90 -6.29 39.5641 25.8 665.64
3112 32 30 -5.29 27.9841 -34.2 1169.64
2046 32 45 -5.29 27.9841 -19.2 368.64
1128 33 90 -4.29 18.4041 25.8 665.64
3110 34 20 -3.29 10.8241 -44.2 1953.64
3065 36 30 -1.29 1.6641 -34.2 1169.64
3079 36 15 -1.29 1.6641 -49.2 2420.64
1091 36 90 -1.29 1.6641 25.8 665.64
2067 37 30 -0.29 0.0841 -34.2 1169.64
1116 38 60 0.71 0.5041 -4.2 17.64
1006 38 120 0.71 0.5041 55.8 3113.64
2083 39 30 1.71 2.9241 -34.2 1169.64
2154 40 45 2.71 7.3441 -19.2 368.64
1009 40 120 2.71 7.3441 55.8 3113.64
2059 41 30 3.71 13.7641 -34.2 1169.64
1157 41 45 3.71 13.7641 2.3 368.64
3115 42 60 4.71 22.1841 -4.2 17.64
3139 43 30 5.71 32.6041 -34.2 1169.64
1122 43 90 5.71 32.6041 25.8 665.64
3078 44 90 6.71 45.0241 25.8 665.64
3082 44 30 6.71 45.0241 -34.2 1169.64
1138 45 105 7.71 59.4441 40.8 1664.64

5
Roll Marks Time allocated (mins) X −x (X − x)2 Y −y (Y − y)2
1093 45 60 7.71 59.4441 -4.2 17.64
1127 45 45 7.71 59.4441 -19.2 368.64
2007 45 30 7.71 59.4441 -34.2 1169.64
3061 46 45 8.71 75.8641 -19.2 368.64
3071 47 30 9.71 94.2841 -34.2 1169.64
3085 47 30 9.71 94.2841 -34.2 1169.64
1107 47 120 9.71 94.2841 55.8 3113.64
1125 47 40 9.71 94.2841 2.3 585.64
2019 47 45 9.71 94.2841 -19.2 368.64
1150 48 105 10.71 114.7041 40.8 1664.64
3083 48.5 120 11.21 125.6641 55.8 3113.64
1049 50 45 12.71 161.5441 -19.2 368.64
1096 50 90 12.71 161.5441 7 665.64
2002 50 30 12.71 161.5441 -34.2 1169.64
1155 55 60 17.71 313.6441 -4.2 17.64

From the table, following facts and figures can be received:


X
Summation of Marks= X = 1864.5
X
Summation of time allocated= Y = 3210
X
Total number of data n = N = 50
When x and y denote the mean of respective data distribution:
X
(X − x)(Y − y) = −2195.9
X
(X − x)2 = 5767.045
X
(Y − y)2 = 68618

And hence from the previous discussion, the Pearson Coefficient of Correlation can be calculated
by: P
(x − x)(y − y)
ρ = pP pP
(x − x)2 (y − y)2
−2195.9
ρ= √ √
5767.045 68618
ρ = −0.11
∴ Pearson Coefficient of Correlation = −0.11
This indicates that the data have very less linear relationship as the value is really close to zero. It
also shows that there exists a slight negative relationship between two variables, which feels absurd
given that the relation we are exploring is between marks and time allocated.
But since Pearson Coefficient explores only the linear relation between data maybe, the data is
more monotonic in nature. So for this, we explore, Spearman’s Rank Correlation Coefficient.

6
4.2 Spearman’s Rank Correlation
Spearman’s Rank Correlation is used to measure the monotonic relation among data. It could
be very useful for such data that increase with increase in correlating variable but maybe not
linearly.For this we need to fix ranks for each of the data and then calculate the rank difference.
The distribution table is as:

Roll Marks Time allocated (mins) Rank (Marks) Rank (Time) d d2


1085 11 45 50 31 19 361
1118 12 120 49 4 45 2025
2107 18 45 48 31 17 289
3081 19 120 47 4 43 1849
1010 20 30 46 41 5 25
1078 21 30 44 41 3 9
1016 21 30 44 41 3 9
1137 21 120 44 4 40 1600
1014 25 30 42 41 1 1
1021 27 120 41 4 37 1369
1001 30 150 39 1 38 1444
1136 30 120 39 4 35 1225
1115 31 90 38 12 26 676
3112 32 30 37 41 -4 16
2046 32 45 37 31 6 36
1128 33 90 36 12 24 576
3110 34 20 35 50 -15 225
3065 36 30 34 41 -7 49
3079 36 15 34 51 -17 289
1091 36 90 34 12 22 484
2067 37 30 33 41 -8 64
1116 38 60 32 23 9 81
1006 38 120 32 4 28 784
2083 39 30 31 41 -10 100
2154 40 45 30 31 -1 1
1009 40 120 30 4 26 676
2059 41 30 29 41 -12 144
1157 41 45 29 31 -2 4
3115 42 60 28 23 5 25
3139 43 30 27 41 -14 196
1122 43 90 27 12 15 225
3078 44 90 26 12 14 196
3082 44 30 26 41 -15 225
1138 45 105 25 8 17 289
1093 45 60 25 23 2 4
1127 45 45 25 31 -6 36
2007 45 30 25 41 -16 256
3061 46 45 24 31 -7 49
3071 47 30 23 41 -18 324

7
Roll Marks Time allocated (mins) Rank (Marks) Rank (Time) d d2
3085 47 30 23 41 -18 324
1107 47 120 23 4 19 361
1125 47 40 23 33 -10 100
2019 47 45 23 31 -8 64
1150 48 105 22 8 14 196
3083 48.5 120 21 4 17 289
1049 50 45 20 31 -11 121
1096 50 90 20 12 8 64
2002 50 30 20 41 -21 441
1155 55 60 19 23 -4 16

From the table we can conclude: When d refers to the rank difference between marks and the
time allocated: X
d2 = 18012
Now, as discussed previously, if rs is the Spearman’s Rank Coefficient then, it is give by:

6 d2
P
rs = 1 −
n(n2 − 1)
6 ∗ 18012
rs = 1 −
50(502 − 1)
rs = 0.13
This shows a much better relationship, as rs > 0 however, the value is a bit too small, to make a
significant impact and hence, the data are very weakly related in a monotonic nature, with each
other.

8
Figure 2: Scatter plot of marks and time allocated without consistency Factor

This shows that the data are weakly correlated to each other, however, Pearson’s Coefficient is
of negative Value.

5 Why the weird behavior of Pearson’s coefficient?


Pearson’s coefficient measures the linear relationship between data.If the data is slightly non linear
or more monotonic in nature, Pearson’s coefficient is not reliable. However, its negative value
still doesn’t make sense. Maybe, I failed to take something into account. Maybe something like,
Consistency.

5.1 Consistency
Consistency is a key factor that maybe we failed to account for, in this data beforehand.
Consistency plays a vital role, because a person might study for 150+ mins once a week, but he will
study less than someone who studies for 45+ mins a day, with 4 days a week. Hence, consistency
factor plays a vital role, and here is how we integrate it in our data format.

9
6 Consistency Factor
We define consistency factor as:
Consistency Factor(c) = Number of days student devotes time in a week for Mathematics.
Here is how we could integrate it in our case:
We could find, how much on average the student studies in Mathematics in a week, for this:
c
Consistent Time [Y]=Old time Allocated x 7

This would take in account for any inconsistent and non linear trends. The new data distri-
bution table would look like:
Note that the new Y will be the new consistent time
Roll Marks Time Given Consistency Consistent Time X −x (X − x)2 Y −y (Y − y)2
1085 11 45 2 12.86 -26.29 691.16 -29.08 845.73
1118 12 120 1 17.14 -25.29 639.58 -24.80 614.83
2107 18 45 3 19.29 -19.29 372.10 -22.65 513.15
3081 19 120 7 120.00 -18.29 334.52 78.06 6093.59
1010 20 30 2 8.57 -17.29 298.94 -33.37 1113.37
1078 21 30 3 12.86 -16.29 265.36 -29.08 845.73
1016 21 30 2 8.57 -16.29 265.36 -33.37 1113.37
1137 21 120 4.5 77.14 -16.29 265.36 35.20 1239.34
1014 25 30 2.3 9.86 -12.29 151.04 -32.08 1029.22
1021 27 120 5 85.71 -10.29 105.88 43.78 1916.31
1001 30 150 4.5 96.43 -7.29 53.14 54.49 2969.16
1136 30 120 4 68.57 -7.29 53.14 26.63 709.31
1115 31 90 3 38.57 -6.29 39.56 -3.37 11.34
3112 32 30 7 30.00 -5.29 27.98 -11.94 142.53
2046 32 45 4 25.71 -5.29 27.98 -16.22 263.23
1128 33 90 4 51.43 -4.29 18.40 9.49 90.06
3110 34 20 7 20.00 -3.29 10.82 -21.94 481.30
3065 36 30 7 30.00 -1.29 1.66 -11.94 142.53
3079 36 15 7 15.00 -1.29 1.66 -26.94 725.69
1091 36 90 4 51.43 -1.29 1.66 9.49 90.06
2067 37 30 2 8.57 -0.29 0.08 -33.37 1113.37
1116 38 60 4 34.29 0.71 0.50 -7.65 58.57
1006 38 120 2 34.29 0.71 0.50 -7.65 58.57
2083 39 30 5 21.43 1.71 2.92 -20.51 420.66
2154 40 45 4 25.71 2.71 7.34 -16.22 263.23
1009 40 120 7 120.00 2.71 7.34 78.06 6093.59
2059 41 30 5 21.43 3.71 13.76 -20.51 420.66
1157 41 45 2.3 14.79 3.71 13.76 -27.15 737.28
3115 42 60 7 60.00 4.71 22.18 18.06 326.22
3139 43 30 7 30.00 5.71 32.60 -11.94 142.53
1122 43 90 7 90.00 5.71 32.60 48.06 2309.90
3078 44 90 7 90.00 6.71 45.02 48.06 2309.90
3082 44 30 7 30.00 6.71 45.02 -11.94 142.53
1138 45 105 7 105.00 7.71 59.44 63.06 3976.75
1093 45 60 3.4 29.14 7.71 59.44 -12.80 163.73
1127 45 45 4 25.71 7.71 59.44 -16.22 263.23
2007 45 30 2 8.57 7.71 59.44 -33.37 1113.37
3061 46 45 7 45.00 8.71 75.86 3.06 9.37
3071 47 30 7 30.00 9.71 94.28 -11.94 142.53

10
Roll Marks Time Allocated (mins) Consistency Consistent Time (X − x) (X − x)2 (Y − y) (Y − y)2
3085 47 30 7 30.00 9.71 94.28 -11.94 142.53
1107 47 120 3 51.43 9.71 94.28 9.49 90.06
1114 47 60 2 17.14 9.71 94.28 -24.80 614.83
1125 47 40 2.3 13.14 9.71 94.28 -28.80 829.19
2019 47 45 3 19.29 9.71 94.28 -22.65 513.15
1150 48 105 4 60.00 10.71 114.70 18.06 326.22
3083 48.5 120 7 120.00 11.21 125.66 78.06 6093.59
1049 50 45 6 38.57 12.71 161.54 -3.37 11.34
1096 50 90 7 90.00 12.71 161.54 48.06 2309.90
2002 50 30 2 8.57 12.71 161.54 -33.37 1113.37
1155 55 60 3 25.71 17.71 313.64 -16.22 263.23

From the table it can be concluded that:


X
X = 1864.5
X
Y = 2096.29
X
n = N = 50
X
(X − x)(Y − y) = 1578.74
X
(X − x)2 = 5767.045
X
(Y − y)2 = 53323.18
As discussed previously; P
(x − x)(y − y)
ρ = pP pP
(x − x)2 (y − y)2
1578.74
ρ= √ √
5767.045 53323.18
ρ = 0.09
∴ When we include consistency as a measure, we obtain that Pearson’s coefficient finally returns positive
value of 0.09. Though the value is positive, it is still really minute to make any difference. This shows that
the data are weakly but positively correlated to each other.
Note: The Spearman’s Coefficient for this new data including consistency is rs : 0.14 , Both of them
suggest that the data are related to each other, in a positive way, however not so much, as the value is really
close to 0. This just shows that the data are weakly related to each other.

11
Figure 3: Scatter Plot of marks and time with consistency Factor taken into account

7 Result Analysis
As it seems, including consistency factor doesn’t make that much of a difference, but atleast it converts the
data into more expected somewhat of a positive correlation which makes a bit more sense. Before diving
into conclusions, lets look at some house specific data correlations:

Figure 4: Enter Caption

12

You might also like