Mathematics (5)
Mathematics (5)
March 8, 2025
Abstract
In the typical sense, the amount of hours you put in a subject and the marks you obtain
should correlate with each other in a positive way. Greater the hours, the higher the marks
distribution. In order to investigate if typicality supports the reality, this paper presents the
outcome of a survey carried out between students of 5 to 8 in BNKS, where they were in-
quired about their study hours and the marks they get in Mathematics. Using appropriate
Mathematical tools along the lines of Correlation, Regression, Pearson Correlation Coefficient,
Spearman’s Coefficient and null hypothesis, This paper presnts the outcome of the survey with
Mathematical Background.
1 Background
The academic performance of students is a central concern in the field of education, with profound
implications for their future success and societal well-being. A research study by Ibrahim Baba
Suleiman, ’Key factors influencing students’ academic performance’, shows that marks obtained
by students and the amount of hours they put in, are directly proportional. However, things are
different on the other side of the spectrum. Those that are at the top in any subject, say, Math-
ematics, already seem to know everything there is within the scope of their course, which leads
them to dedicate a very few amount of time in the subject.
These two aspects are perfectly justified, and hence, to analyze this situation, I carried out a
survey of students from Grade 5 to 8 and collected appropriate data. Using mathematical tools
along the lines of regression, correlation coefficients, Pearson and Spearman’s Coefficient, I was
able to deduce a Mathematical Output of the situation, and make sense of the trend inside BNKS.
1
1.2 Survey Questions
To have a general grasp of the nature of questions students were asked, below are the exact
questionnaire (in order) that was asked to the students.
1. Can you please state your name and your roll number?
2. Can you state the marks you obtained in the latest terminal exam in Mathematics subject?
3. Disregarding the time you spend reading Mathematics in the classroom, How
much time would you say you spend on average, in a day on Mathematics?
Include the time you spend on your prep and quiet as well as any additional time
you open and read your Mathematics Book.
4. How many days would you say you study Mathematics in a week on average? If its hard to
find an estimate, maybe try evaluating yourself this week, and count the number of days you
have spent reading Mathematics.
These questions were the exact questions asked verbally to the students as a source of raw data.
3. Collected data from various categories of students ranging from ’weak’ to ’toppers’.
Below represents the Variation of data collection sample space:
2
All of the strategies adopted for data collection, are done to remove and avoid any biasness that
could have arose during data collection. Also note that the data following a pattern of nor-
mal distribution is enough proof that the data isnt biased atleast on the fundamental
level. (For the raw data refer to Appendix-1)
Its value for any data distribution lies between -1 and 1, where -1 implies high negative core-
lation (inverse proportional relation) and 1 implies high positive corelation (direct proportional
relation). It is calulcated by the following formula:
Cov(x, y)
ρ=
σx σy
where, ρ is the Pearson Coefficient , Cov(x, y) is the covarience Between x,y, and σx and σy are
the standard deviations of respective data classes. It is commonly written in the form of:
P
(x − x)(y − y)
ρ = pP pP
(x − x)2 (y − y)2
whereby, x and y represent the mean of the respective data distribution.
6 d2
P
rs = 1 −
n(n2 − 1)
where n is the number of data and d = R(x) − R(y) which is the difference between rank of a data
and its corresponding y value. Note that:
−1 ≤ rs ≤ 1
and -1 represents strong inverse corelation while 1 represents strond direct corelation.
3
Introducing these two coefficients in our research would be a good way to estimate the corelation
of data.Both linear and monotonous forms of data will be analyzed in this way.
4
4 Mathematical Analysis of Data
Using the afore mentioned statistical measures, we can easily estimate the correlation between
marks and time allocated. Using methods that allow us to explore both linear (Pearson) and
monotonic (Spearman) relations, we could explore the trend of data.
5
Roll Marks Time allocated (mins) X −x (X − x)2 Y −y (Y − y)2
1093 45 60 7.71 59.4441 -4.2 17.64
1127 45 45 7.71 59.4441 -19.2 368.64
2007 45 30 7.71 59.4441 -34.2 1169.64
3061 46 45 8.71 75.8641 -19.2 368.64
3071 47 30 9.71 94.2841 -34.2 1169.64
3085 47 30 9.71 94.2841 -34.2 1169.64
1107 47 120 9.71 94.2841 55.8 3113.64
1125 47 40 9.71 94.2841 2.3 585.64
2019 47 45 9.71 94.2841 -19.2 368.64
1150 48 105 10.71 114.7041 40.8 1664.64
3083 48.5 120 11.21 125.6641 55.8 3113.64
1049 50 45 12.71 161.5441 -19.2 368.64
1096 50 90 12.71 161.5441 7 665.64
2002 50 30 12.71 161.5441 -34.2 1169.64
1155 55 60 17.71 313.6441 -4.2 17.64
And hence from the previous discussion, the Pearson Coefficient of Correlation can be calculated
by: P
(x − x)(y − y)
ρ = pP pP
(x − x)2 (y − y)2
−2195.9
ρ= √ √
5767.045 68618
ρ = −0.11
∴ Pearson Coefficient of Correlation = −0.11
This indicates that the data have very less linear relationship as the value is really close to zero. It
also shows that there exists a slight negative relationship between two variables, which feels absurd
given that the relation we are exploring is between marks and time allocated.
But since Pearson Coefficient explores only the linear relation between data maybe, the data is
more monotonic in nature. So for this, we explore, Spearman’s Rank Correlation Coefficient.
6
4.2 Spearman’s Rank Correlation
Spearman’s Rank Correlation is used to measure the monotonic relation among data. It could
be very useful for such data that increase with increase in correlating variable but maybe not
linearly.For this we need to fix ranks for each of the data and then calculate the rank difference.
The distribution table is as:
7
Roll Marks Time allocated (mins) Rank (Marks) Rank (Time) d d2
3085 47 30 23 41 -18 324
1107 47 120 23 4 19 361
1125 47 40 23 33 -10 100
2019 47 45 23 31 -8 64
1150 48 105 22 8 14 196
3083 48.5 120 21 4 17 289
1049 50 45 20 31 -11 121
1096 50 90 20 12 8 64
2002 50 30 20 41 -21 441
1155 55 60 19 23 -4 16
From the table we can conclude: When d refers to the rank difference between marks and the
time allocated: X
d2 = 18012
Now, as discussed previously, if rs is the Spearman’s Rank Coefficient then, it is give by:
6 d2
P
rs = 1 −
n(n2 − 1)
6 ∗ 18012
rs = 1 −
50(502 − 1)
rs = 0.13
This shows a much better relationship, as rs > 0 however, the value is a bit too small, to make a
significant impact and hence, the data are very weakly related in a monotonic nature, with each
other.
8
Figure 2: Scatter plot of marks and time allocated without consistency Factor
This shows that the data are weakly correlated to each other, however, Pearson’s Coefficient is
of negative Value.
5.1 Consistency
Consistency is a key factor that maybe we failed to account for, in this data beforehand.
Consistency plays a vital role, because a person might study for 150+ mins once a week, but he will
study less than someone who studies for 45+ mins a day, with 4 days a week. Hence, consistency
factor plays a vital role, and here is how we integrate it in our data format.
9
6 Consistency Factor
We define consistency factor as:
Consistency Factor(c) = Number of days student devotes time in a week for Mathematics.
Here is how we could integrate it in our case:
We could find, how much on average the student studies in Mathematics in a week, for this:
c
Consistent Time [Y]=Old time Allocated x 7
This would take in account for any inconsistent and non linear trends. The new data distri-
bution table would look like:
Note that the new Y will be the new consistent time
Roll Marks Time Given Consistency Consistent Time X −x (X − x)2 Y −y (Y − y)2
1085 11 45 2 12.86 -26.29 691.16 -29.08 845.73
1118 12 120 1 17.14 -25.29 639.58 -24.80 614.83
2107 18 45 3 19.29 -19.29 372.10 -22.65 513.15
3081 19 120 7 120.00 -18.29 334.52 78.06 6093.59
1010 20 30 2 8.57 -17.29 298.94 -33.37 1113.37
1078 21 30 3 12.86 -16.29 265.36 -29.08 845.73
1016 21 30 2 8.57 -16.29 265.36 -33.37 1113.37
1137 21 120 4.5 77.14 -16.29 265.36 35.20 1239.34
1014 25 30 2.3 9.86 -12.29 151.04 -32.08 1029.22
1021 27 120 5 85.71 -10.29 105.88 43.78 1916.31
1001 30 150 4.5 96.43 -7.29 53.14 54.49 2969.16
1136 30 120 4 68.57 -7.29 53.14 26.63 709.31
1115 31 90 3 38.57 -6.29 39.56 -3.37 11.34
3112 32 30 7 30.00 -5.29 27.98 -11.94 142.53
2046 32 45 4 25.71 -5.29 27.98 -16.22 263.23
1128 33 90 4 51.43 -4.29 18.40 9.49 90.06
3110 34 20 7 20.00 -3.29 10.82 -21.94 481.30
3065 36 30 7 30.00 -1.29 1.66 -11.94 142.53
3079 36 15 7 15.00 -1.29 1.66 -26.94 725.69
1091 36 90 4 51.43 -1.29 1.66 9.49 90.06
2067 37 30 2 8.57 -0.29 0.08 -33.37 1113.37
1116 38 60 4 34.29 0.71 0.50 -7.65 58.57
1006 38 120 2 34.29 0.71 0.50 -7.65 58.57
2083 39 30 5 21.43 1.71 2.92 -20.51 420.66
2154 40 45 4 25.71 2.71 7.34 -16.22 263.23
1009 40 120 7 120.00 2.71 7.34 78.06 6093.59
2059 41 30 5 21.43 3.71 13.76 -20.51 420.66
1157 41 45 2.3 14.79 3.71 13.76 -27.15 737.28
3115 42 60 7 60.00 4.71 22.18 18.06 326.22
3139 43 30 7 30.00 5.71 32.60 -11.94 142.53
1122 43 90 7 90.00 5.71 32.60 48.06 2309.90
3078 44 90 7 90.00 6.71 45.02 48.06 2309.90
3082 44 30 7 30.00 6.71 45.02 -11.94 142.53
1138 45 105 7 105.00 7.71 59.44 63.06 3976.75
1093 45 60 3.4 29.14 7.71 59.44 -12.80 163.73
1127 45 45 4 25.71 7.71 59.44 -16.22 263.23
2007 45 30 2 8.57 7.71 59.44 -33.37 1113.37
3061 46 45 7 45.00 8.71 75.86 3.06 9.37
3071 47 30 7 30.00 9.71 94.28 -11.94 142.53
10
Roll Marks Time Allocated (mins) Consistency Consistent Time (X − x) (X − x)2 (Y − y) (Y − y)2
3085 47 30 7 30.00 9.71 94.28 -11.94 142.53
1107 47 120 3 51.43 9.71 94.28 9.49 90.06
1114 47 60 2 17.14 9.71 94.28 -24.80 614.83
1125 47 40 2.3 13.14 9.71 94.28 -28.80 829.19
2019 47 45 3 19.29 9.71 94.28 -22.65 513.15
1150 48 105 4 60.00 10.71 114.70 18.06 326.22
3083 48.5 120 7 120.00 11.21 125.66 78.06 6093.59
1049 50 45 6 38.57 12.71 161.54 -3.37 11.34
1096 50 90 7 90.00 12.71 161.54 48.06 2309.90
2002 50 30 2 8.57 12.71 161.54 -33.37 1113.37
1155 55 60 3 25.71 17.71 313.64 -16.22 263.23
11
Figure 3: Scatter Plot of marks and time with consistency Factor taken into account
7 Result Analysis
As it seems, including consistency factor doesn’t make that much of a difference, but atleast it converts the
data into more expected somewhat of a positive correlation which makes a bit more sense. Before diving
into conclusions, lets look at some house specific data correlations:
12