0% found this document useful (0 votes)
18 views

Reseach For Linear Regression

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Reseach For Linear Regression

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Experimental Results and Discussion

In general, educators believe that class attendance has a considerable impact on course
achievement, all other conditions being equal. An education researcher chooses a multiple part
basic computer science course at a large university to evaluate the relationship between
attendance and performance. Throughout the semester, the course instructors agree to keep
an accurate record of attendance. In this proposed study, we have taken data from Dr.
Rammanohar Lohia Avadh University of BCA final year students for experimental purpose. The
sample size of data is 30. Here, we have taken 30 students randomly out of 60 students at the
end of the semester. Two measurements are taken for each student in the 30 sample which are
given below:
1. 1. The number of days (d) the student was unable to attend class.
2. 2. End semester score (s).
Table 1 shows the dataset of 30 students and figure 1 depicts scatter plot of 30 students’ data.
Table 1:
Data Set of 30 Students
Figure 1:
Scatter plot of 30 Students Data
Table 2 shows the calculation of Calculation of d2,ds, and s2 In table 2 we have added the
numbers in each column which will use in generating the trendline or predicted line.
Table 2:
Calculation of and

The above data table 2 gives the following expression:


We start by determining the least squares regression line or predicted line, which is the line
that best fits for the data. Its y- intercept and slope are given below:

The least squares regression line for this data, rounded to two decimal places, is:

Figure 2 depicts the fitted regression or predicted line of 30 students’ data.

Figure 2:
Fitted regression or predicted line of 30 students’ data
The figure 2 also shows that a decreasing trend, indicating that students with more absences
perform worse on the final exam on average. The total of the squared errors (Sum of Square
Error) of this line's goodness of fit to the scatter plot is:

This is a huge amount. As a result, it isn't particularly useful in and of itself, but we utilised it to
calculate a crucial statistic:

The statistic S€ calculates the standard deviation (σ) of the model's normal random variable (€).
It means that the standard deviation of final test grades for all students with the same number
of absences is around 13.85 points. Because the number of absentees has such a huge impact
on a 100-point exam, the final exam scores of each sub-population of students are very diverse.

The size and sign of the slope of the predicted line imply that, on
average, students score 4.54 points lower on the final test for each class missed. Similarly,
students tend to score 2 x 4.54=9.08 points less on the final test for every two classes missed,
or around a letter grade lower on average. The s-intercept has importance in this situation
because 0 is inside the range of d- values in the data set. It's a guess about the average final test

grade for all students that have perfect attendance. The intercept is
the anticipated average for such pupils/students.
Before we go any farther with the regression equation or run any other analysis, it's a good idea
to look at how useful the linear regression model is. This can be accomplished in two ways:
1. 1. By calculating the correlation coefficient r, you can discover how closely the number
of absentees (d) and the final exam score (s) are related.
2. 2. By putting the null hypothesis H0: β1=0 to the test (The slope of the population
regression line is zero, indicating that d is not a reliable predictor of s) vs the natural
alternative H0: β1 <0 (Because the population regression line has a negative slope, final
exam scores (s) decrease as absentees (d) increase).
The correlation coefficient r is:
There is a moderately negative connection between the two variables. We can observe that
there is a negative relationship between absentees and student scores. It means absentees of
the student is increase then score will be decrease. Scores and absentees are two important
data points in this study. The hypothesis developed for the first case involving absenteeism and
student scores. Let us consider here the hypothesis which ais given below:
Null Hypothesis (H0): Absentees does not affect Scores.
Alternative Hypothesis (Ha): Absentees affect the Scores
We have developed two hypotheses, and now we will use a statistically independent t-test to
see if the null or alternate hypothesis is correct. The independent t-test is performed to see if
there is any correlation between absentee and student score. The t-test are used to test the
hypothesis that the regression coefficients produced in basic linear regression are accurate. The
two-sided hypothesis that the true slope, β1, equals some constant value, β1,0, is tested using a
statistic based on the t distribution. Let's look at the test of hypotheses using the generally used
5% level of significance. The hypothesis test statements are written as follows:

From the “Critical Value of” with degree of freedom t0.05 so the

rejection region is The value of the standardized test is:

This is located in the rejection zone. In favor of H0, we reject H0 At the 5% level of significance,
the statistics support the conclusion that β1 is negative, implying that as the number of
absentees grows, the average final exam score declines. As previously stated, the

figure is a point estimate of how much one additional absentee


affects the average final exam result. The average reduces by around 4.54227 points for each
subsequent absentee.
The frequency of absentees has been visualized using a histogram in figure 3.
Figure 3: Histogram absentees with their frequency

Figure 3:
Histogram absentees with their frequency
We can observe from this histogram that the majority of students have absentees in the range
of zero percent to three. For β1, we can expand this point estimate to a confidence interval.
"Critical Values of" with d.f=30-2=28 degrees of freedom, ta/2=t0.025=2.048at the 95 percent
confidence level. Based on our sample data, the 95 percent confidence interval for β 1 is:

We are 95 percent positive that, among all students who have ever taken this course, the
average final test score drops by 2.94 to 7.52 points for each extra class missed. If we focus on
the sub-population of all students who had exactly five absences. We may estimate the average
final test score for those students using the least squares regression equation

This is also our best estimate of a student's final exam grade if he/she is absent five times. The
average final test score for all students with five absences has a 95% confidence interval of:
According to this confidence interval, the true mean final test score for all students who miss
class precisely five times over the semester is expected to be between 59.92 and 75.14. If a
student misses exactly five classes during the semester, his final exam score is predicted to be
in the interval with 95 percent certainty.

This prediction interval indicates that this student's final exam score will most likely fall
somewhere between 38.16 and 96.90. Unlike the 95 percent confidence interval for the
average score of all students with five absences, which provided useful information, this
interval is so large that it reveals almost nothing about the final exam score of any particular
student. The existence of the extra summand 1 under the square sign in the prediction interval
can have a dramatic effect in this case. Finally, the coefficient of determination, r2, estimates
the fraction of the variability in students' final exam scores that is explained by the linear
relationship between that score and the number of absences. Since we've already calculated .,
we can readily deduce:

As a result, the regression model explains 37 percent of the variability in the yield data,
demonstrating a good fit of the regression model. Despite the fact that there is a strong link
between attendance and final test performance. Although we can estimate the average score
of students who miss a specific number of classes with reasonable accuracy, the number of
absentees accounts for less than half of the entire range in exam scores in the sample. This is
hardly surprising, given that student exam performance is influenced by a variety of factors
other than attendance.
A residual plot is a graph in which the residuals are displayed on the vertical axis and the
independent variable is displayed on the horizontal axis. A linear regression model is
appropriate for the data if the dots in a residual plot are randomly distributed across the
horizontal axis; otherwise, a nonlinear model is more suited. Table 3 shows the output of linear
regression model and residuals.
Figure 4 also depicts that the residual plot displays a haphazard pattern. Some residual points
are positive, while others are negative. This random pattern implies that the data is well-fit by
proposed a linear model.
V. Conclusion
The findings of this study revealed that absence has a major impact on academic achievement.
The research was carried out in-depth, with a lot of data visualization and statistical modelling
included in the publication. The findings revealed a moderately negative relationship between
the number of absences and the final score (r=-0.6088, p=0.00036 which is les than 0.05) exam
scores between students who missed less than and equal to 22% of their classes and students
who missed more than 23% of their classes (t-test=-4.06075 and the p is less then 0.05) The key
finding was that if a student misses one class, their final test grades are projected to drop by
4.54 percent on average. It is believed that the findings of this study would help the colleges
and university plan for students who will graduate on time. Furthermore, this study has the
potential to raise student knowledge about the impact of missing courses on their academic
performance.

Table 3:
Output of Linear Regression Model and residuals
Figure 4:
Residual plot of proposed regression model
“Comment”
This research is focuses on absentees of student in class and score and has been
carried out by using linear regression analysis. Linear regression analysis is one of excellent
method and simplest statistical tool. The descriptive, student's t-test, Pearson correlation,
and regression models were used in this study's statistical analysis. According to the results of
this study, there are considerable variations between absentees and score (t-test=-
4.06075,p<0.05). The study also discovered that absenteeism from class had a negative link
with the score (r=-0.6088). To investigate the impact of class absentees on student score, a
regression model was created. This study will benefit both the college administration and the
students by raising awareness of the disadvantages of not attending classes. There is a
moderately negative connection between the two variables. We can observe that there is a
negative relationship between absentees and student scores. It means absentees of the
student is increase then score will be decrease. Scores and absentees are two important data
points in this study. The hypothesis developed for the first case involving absenteeism and
student scores.

You might also like