Analysis of Attention Span of Students Using Deep Learning
Analysis of Attention Span of Students Using Deep Learning
Deep Learning
Varad Warankar1, Nishtha Jain2, Bhavesh Patil3, Mohammed Faizaan4,
Dr.Balaso Jagdale5, Dr. Shounak Sugave6
Department of Computer Science and Engineering,
Dr. Vishwanath Karad MIT World Peace University, Pune
[email protected], [email protected]
Abstract - This research work presents an experimental work adjustments. The primary accuracy objective is to reliably
of the Analysis of Attention Span of Students using Deep determine student attention, optimizing teaching
Learning, a novel application employing deep learning effectiveness through a data-driven approach.
techniques for assessing student engagement in educational
settings. The system incorporates facial recognition, eye tracking, II. LITRATURE REVIEW
and head pose analysis to offer real-time insights into students'
The Head Position Estimator proposed, provides a
attentiveness during lessons.
practical solution for head pose estimation, leveraging 68
The introduction outlines the motivation behind the research, facial features for ease of implementation as shown in fig 1.0.
emphasizing its significance in the dynamic landscape of In real-world scenarios, dynamic environments introduce
education. The report addresses the project's relevance in challenges for focus-of-attention (FOA). This leads to
optimizing learning environments, delivering personalized potential issues such as suboptimal quality, occlusion, low-
education, and addressing challenges in remote learning resolution images, disruptions, and non-linear correlations
scenarios. between head pose angles and actual values [1][6].
Keywords - Artificial Intelligence, Deep Learning, Eye Gaze, The attention span formula implemented in RK et al. [2]
Face Recognition, Head Position, Attention. system adopts a practical approach by leveraging both the
camera and microphone for real-time calculations. In
I. INTRODUCTION practical terms, this means that our system actively measures
In the dynamic landscape of education, the imperative for attention span, considering external factors such as
efficacious tools to evaluate and augment student engagement environmental disruptions or low-resolution images. This
is paramount. The "Analysis of Attention Span of Students attention span formula serves as a dynamic tool, adapting to
using Deep Learning" addresses this exigency by leveraging real-world conditions and providing a more reliable metric
advanced computer vision and deep learning techniques, for assessing and monitoring attention levels.
affording educators real-time insights into students' System by Meng-Hao Guo [3] addresses the challenge of
attentiveness during instructional sessions. This initiative is multiple attention demands in a practical manner. By utilizing
propelled by the escalating significance of personalized a computer camera, the attention calculation takes into
learning experiences and the aspiration to foster dynamic, account the face position and eye gaze, crucial elements in
adaptive teaching environments. real-world attention assessments. This practical
Our system aims to enhance classroom engagement implementation acknowledges the complexities of diverse
through an attention detection algorithm, integrating facial attention environments, providing adaptability to different
recognition for real-time assessment of student attention levels. scenarios and ensuring the system's relevance in practical
A streamlined data collection pipeline records facial usage.
expressions during lessons, feeding into a Time-Attention A Convolutional Neural Network (CNN) is a class of
Graph that visualizes engagement trends. deep learning models specifically designed for image
The goal is to provide teachers with a user-friendly dashboard processing and recognition tasks. It utilizes convolutional
for immediate insights, enabling quick instructional layers to automatically and adaptively learn spatial
2
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.
image is detected and stored with high precision, thus
providing a solid foundation for training a robust and unbiased
facial recognition model.
Data Preprocessing and Augmentation
confidence = detections[0, 0, i, 2]
if confidence > 0.7:
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
face_crop = frame[startY:endY, startX:endX]
Fig. 3. (a) Data Collection
cv2.imwrite(face_filename, face_crop)
The preprocessing and augmentation of data are pivotal D. Model Training:
steps in our methodology, carried out by the dataprocessing.py After the data collection phase, the focus shifts to model
script. This script standardizes the images to a uniform format training. This stage involves employing sophisticated
of 100x100 pixels, 200x200 pixels and normalizes the pixel algorithms that analyze and learn patterns from the processed
values to a [0, 1] range. To enhance the model's ability to data. The model is specifically trained to recognize key facial
generalize across various scenarios and reduce overfitting, we features such as eye aspect ratio, eye gaze, and head rotation.
employ data augmentation techniques, including random The use of machine learning algorithms allows our model to
rotations and shifts. These techniques artificially expand the adapt and improve its accuracy over time as it processes more
dataset, introducing a broader range of facial positions and data.
expressions, thereby improving the model's robustness and its
ability to perform accurately in real-world conditions. Application: With the trained model in place, the next
step is its application in real-time scenarios. The model
Optimized Data Processing operates continuously, calculating the attention span
datagen = ImageDataGenerator( percentage of students during live interactions. The results
are stored in a comprehensive CSV file, which includes not
rotation_range=20, only the average attention span but also the unique student
ID, the duration the camera was active, and details about the
width_shift_range=0.2,
specific subject for which attention data is being recorded
height_shift_range=0.2, (see Fig 6.0).
horizontal_flip=True) Web Page: The application's integration into a web-
based platform is a crucial aspect of our project. The front-
C. Training and Model Architecture end, developed using JavaScript, Python, HTML, and CSS,
Our model's architecture, designed for facial recognition provides a user-friendly interface. The back end, powered by
and attention span analysis, is outlined in the trainingdata.py Flask, fetches data from the CSV file, ensuring real-time
script. The model is a Convolutional Neural Network (CNN) visualization through dynamic graphs on the web page.
that includes multiple layers of convolution and pooling, Additionally, an Admin panel (see Fig 3.0b) is implemented,
followed by dropout layers for regularization, and a dense layer offering a secure signup and signing process for access
for the final classification. The CNN architecture is specifically control, ensuring that only authorized individuals can interact
chosen for its effectiveness in handling image data and its with and view the collected data.
ability to learn hierarchical feature representations. By training
this model on our diverse and augmented dataset, we achieved
significant improvements in accuracy, with the model
demonstrating a remarkable ability to reduce bias across
different demographics.
The application of our dataset diversity and optimization
techniques culminated in the development of a deep learning
model that achieved an accuracy rate of 97.78% under
uncertain conditions and 98.66% in controlled laboratory
conditions. This high level of accuracy is indicative of the
model's robustness and its capability to analyze attention spans
across a diverse student population with minimal bias. The
incorporation of multithreading into our methodology not only Fig. 3. (b) Admin panel for end user
enhanced the model's training efficiency but also demonstrated
a reduction in training time by 40%, highlighting the To access the graph and detailed data, users must interact
effectiveness of our optimized approach. with the system by entering their unique student ID. This
personalized approach ensures privacy and tailored access to
the information. Once the user inputs their ID, the system
loads the corresponding data, offering a customized
experience and maintaining confidentiality (see Fig 4.0).
3
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Attention dashboard
Throughout the project, a carefully crafted timeline (see Fig Fig. 6. facial recognition and attention analysis
5.0) serves as a guiding beacon. This timeline outlines the
sequential progression of project phases, ensuring a systematic C. Model 3: Head Rotation Model
and efficient development process. It acts as a reference point, The Head Rotation Model evaluates the orientation of the
allowing the team to track progress, meet deadlines, and face. By establishing a median plane, the system can discern
maintain a cohesive workflow. the angular position of the face relative to this plane. The
head's rotation is then calculated by measuring the deviation
from this median. The formula used to determine head
rotation might be:
4
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.
that our project is able to find the attention of the student. And
also is storing the student id as well as subject.
This is then deployed into the web based page created
using Python , JavaScript and HTML. In which a graph is
where as, P is the pupil's center, computed as the mean then created from the data collected as shown in fig 7.0,
position of the eye landmarks associated with the pupil. C is which shows insights of attention span.
the corneal specular reflection point, identified through the
analysis of high-intensity pixels within the eye region. V. ANALYSIS AND RESULTS
פ3í&פפ3í& פis the normalization factor ensuring the gaze In our approach to ensuring the proper functioning of our
vector is a unit vector, facilitating a standardized measure of system, we meticulously stored individualized data for each
gaze direction across different individuals. Where '|| ||' denotes student in our dataset. This comprehensive student data
the magnitude of the vector. repository allowed us to conduct a thorough examination of
Pseudo Code for Gaze Direction the system's output by employing the model on multiple
Estimation students. This iterative process was crucial in evaluating the
accuracy and effectiveness of our system.
function calculateGazeVector(eyeLandmarks,
frame): A. Results and System Optimization:
thresholdFrame = applyColorThreshold(frame) As we continued to augment the dataset and fine-tune the
pupilCenter = calculatePupilCenter(eyeLandmarks) system, we observed a commendable improvement in
accuracy. The accuracy metric reached an impressive 97%,
cornealReflection =
locateCornealReflection(thresholdFrame) reflecting the successful incorporation of additional data and
the continuous optimization of our system as shown in Fig
gazeVector = normalizeVector(pupilCenter - 8.0. This iterative process of data augmentation and system
cornealReflection)
refinement played a pivotal role in enhancing the overall
return gazeVector performance.
Key steps in the pseudo code are as follows
applyColorThreshold(frame): Applies a color-based
threshold to identify the corneal specular reflection. This
method is designed to work directly on the color frame,
focusing on the high-intensity areas that correspond to specular Fig. 8.
reflections, without converting the image to grayscale.
B. Attention Calculation Precision:
calculatePupilCenter(eyeLandmarks): Computes the center Our system's attention calculation demonstrated
of the pupil based on the detected eye landmarks. This step remarkable precision, particularly when the face of the
averages the positions of landmarks around the pupil to find its student was consistently visible on the screen. External
center, ensuring that the calculation is sensitive to the actual factors, such as lighting conditions, were identified as
eye geometry. potential influencers on visual data accuracy. Consequently,
locateCornealReflection(thresholdFrame): Identifies the maintaining optimal visibility of the student's face was
corneal specular reflection point within the thresholded frame. identified as a key factor in ensuring accurate attention
This function searches for the brightest areas in the image, measurements.
which are indicative of specular reflections, to accurately Through empirical testing and validation, the enhanced
locate the corneal reflection. EAR formula and improved gaze detection algorithm have
normalizeVector(vector): Normalizes the vector calculated shown a marked improvement in the precision of
from the difference between the pupil center and the corneal attentiveness measurement. Our system now accurately
reflection point. Normalization is crucial for standardizing the identifies attention lapses with an improved detection rate of
gaze direction across different frames and subjects, allowing up to 20% over traditional methods. Moreover, the system
for a consistent assessment of gaze direction. demonstrates a significant reduction in false positives and
negatives, ensuring that students' attentiveness is measured
Combining all these three modules gives us the way to more reliably and accurately.
finding the Attention of the Student. From fig 6.0 we can see
In the development of our attention span analysis system
using deep learning, we recognized performance as a crucial
element for its application in real-time scenarios. To tackle
this, we've adopted multithreading optimizations, which
markedly improve the system's efficiency by facilitating
parallel processing of video stream data alongside our model
inference. This report delves into the applied techniques, their
impact on system performance, and outlines the actual
performance improvements observed.
C. Multithreading Implementation
Our strategy involved leveraging the Python threading
module to introduce multithreading into our system. The
Fig. 7. Insight visualization
primary objective was to segregate the video capture process,
5
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.
face detection, gaze detection, and attention span analysis into Meanwhile, the Flask backend serves as the backbone of
distinct threads. This method enables these components to run our application, efficiently managing data processing,
simultaneously, diminishing the reliance on sequential analysis, and storage. The synergy between the frontend and
execution and thus, decreasing processing times. backend components ensures a smooth and effective user
experience, contributing to the overall success of the system.
D. System Architecture:
The architecture of the system incorporates four principal VI. CONCLUSION
threads: In conclusion, this paper represents a significant stride in
Video Capture Thread: Constantly captures video frames harnessing technology for educational enhancement.
from the webcam and stores them in a thread-safe queue for Through the meticulous amalgamation of deep learning,
further processing. computer vision, and web development, we have crafted a
system adept at calculating student attention spans—a tool
Face Detection Thread: Takes frames from the queue to that promises to be invaluable in educational settings.
conduct face detection using a pre-trained model, pinpointing
regions of interest (ROIs) for subsequent analysis. Our journey began with the rigorous collection and
processing of data, which laid a solid foundation for our
Gaze and Blink Detection Thread: Processes ROIs to machine learning model. The high degree of accuracy
identify eye blinks, determine gaze direction, and compute the achieved in recognizing individual students underscored the
Eye Aspect Ratio (EAR) for evaluating attentiveness. model's robustness, allowing for a nuanced evaluation of
Data Aggregation and Analysis Thread: Compiles the attention spans.
processed data, computes attention span metrics, and updates The crux of our success hinged on the seamless
the user interface or database instantaneously. algorithmic integration. Utilizing OpenCV for image
E. Multithreading Libraries and Tools processing and MediaPipe for real-time analysis, we
pioneered a method that not only tracks but understands
For thread management and inter-thread communication, attention patterns. These algorithms operate in concert,
we used Python's threading library and Queue module. The ensuring precise and instantaneous monitoring of
Queue module proved especially valuable for ensuring thread- engagement levels.
safe operations when transferring data between the producer
(video capture) and consumer threads (processing tasks). The development of the frontend and backend of the
application was approached with user-centric design
F. Performance Improvement philosophies, ensuring that the system was approachable for
The introduction of multithreading led to a significant uplift all stakeholders. The Flask backend acts as the stronghold of
in the system's overall performance. The major performance the application, deftly managing the complex data
metrics observed include: interactions and providing a sturdy platform for the frontend
to display results.
Processing Latency: We noticed a reduction in the average
processing time per frame by roughly 30-50%, depending on Our deployment strategy was carefully selected to reflect
the hardware and conditions. This enhancement is mainly due the system's need for flexibility. By utilizing a dedicated
to the simultaneous processing of video frames and machine server for deployment and local hosting environments for
learning inference. development and testing, we underscored our commitment to
creating a scalable and adaptable system.
Throughput: The system experienced a 50-60% boost in the
number of frames processed per second, facilitating smoother The rigorous performance evaluation, encompassing both
real-time analysis and feedback. load and stress testing, confirmed the system's resilience and
scalability. It stands as a testament to the robust architecture
Resource Utilization: There was a more effective that reliably bears the weight of extensive data loads and user
distribution of CPU resources, ensuring a balanced load across interactions.
cores and minimizing the risk of any single core becoming a
bottleneck. As we look to the future, this research work does not just
mark the culmination of a research endeavor; it marks the
These performance improvements stem from the task beginning of a new chapter in educational technology. It
parallelization that replaces previous sequential task execution. opens up the possibility of further enhancements and sets the
For example, while the face detection thread processes one stage for transforming the educational landscape. The
frame, the video capture thread can concurrently capture the potential for this technology to evolve and integrate into
next frame, and the gaze detection thread can process the various educational frameworks is vast, signaling a new era
preceding frame. This approach minimizes CPU core idle times where technology and pedagogy converge to enrich the
and optimizes the use of available hardware resources, leading learning experience.
to a more efficient and responsive system.
The success of research work and potential for growth
G. User-Friendly Frontend and Robust Flask Backend: signal a beacon for future research, inviting exploration into
The success of our system is not only attributed to the new domains that could further augment the utility and
accuracy of data analysis but also to the seamless integration of efficacy of attention tracking in education.
a user-friendly frontend and a robust Flask backend. The front
end, designed with versatility in mind, caters to the needs of REFERENCES
various stakeholders involved in the education ecosystem. It [1] T. Singh, M. Mohadikar, S. Gite, S. Patil, B. Pradhan and A. Alamri,
provides an intuitive interface for users to interact with and "Attention Span Prediction Using Head-Pose Estimation With Deep
interpret the attention span data. Neural Networks," in IEEE Access, vol. 9, pp. 142632-142643, 2021,
doi: 10.1109/ACCESS.2021.3120098.
6
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.
[2] R. RK, S. S, V. P and S. K, "Real-time Attention Span Tracking in Online
Education," 2020 IEEE MIT Undergraduate Research Technology
Conference (URTC), Cambridge, MA, USA, 2020, pp. 1-4, doi:
10.1109/URTC51696.2020.9668900.
[3] Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-
Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming
Cheng, Senior Member, IEEE, Shi-Min Hu, Senior Member, IEEE.,
“Attention Mechanisms in Computer Vision: A Survey” ,JOURNAL OF
LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
https://arxiv.org/pdf/2111.07624.pdf
[4] Siavash Gorji, James J. Clark, “Attentional Push: A Deep Convolutional
Network for Augmenting Image Salience with Shared Attention
Modeling in Social Scenes”, Centre for Intelligent Machines,
Department of Electrical and Computer Engineering, McGill University
Montreal, Quebec, Canada
[5] Silver, Adrienne & Gangopadhyay, Avijit & Gawarkiewicz, Glen &
Taylor, Arnold & Sanchez-Franks, Alejandra. (2021). Forecasting the
Gulf Stream Path Using Buoyancy and Wind Forcing Over the North
Atlantic. Journal of Geophysical Research: Oceans. 126.
10.1029/2021JC017614.
[6] Ramadan TH. Hasan1, Amira Bibo Sallow, “Face Detection and
Recognition Using OpenCV” IT Department, Technical College of
Informatics Akre, Duhok Polytechnic University, Duhok, Kurdistan
Region, IRAQ. College of Engineering, Nawroz University, Duhok,
Kurdistan Region, IRAQ DOI:
https://doi.org/10.30880/jscdm.2021.02.02.008
[7] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is
All You Need" Published in: 2017 Advances in Neural Information
Processing Systems (NeurIPS)
[8] Davis E. King), “Facial Landmark Detection with OpenCV”, DBLP
URL: http://dlib.net/face_landmark_detection.py.html
[9] Zhiwei Luo, Junliang Xing, David Zhang, “Facial Landmark Detection
and Tracking for Driver Monitoring Systems”, 2017 IEEE Transactions
on Biometrics, Behavior, and Identity Science (TBIOM)
[10] Soukaina Belabbes, Abdellatif Benabdelhafid, Abderrahim Saaidi,
"Real-Time Eye Blink Detection Using Facial Landmarks" 2018
International Conference on Intelligent Systems and Computer Vision
(ISCV)
[11] S. Tyagi, K. Panchal, P. Chitta, S. Todi, S. Priya and R. Pawar,
"Emotionomics: Pioneering Depression Detection Through Facial
Expression Analytics," 2023 7th International Conference On
Computing, Communication, Control And Automation (ICCUBEA),
Pune, India, 2023, pp. 1-4, doi:
10.1109/ICCUBEA58933.2023.10392157
[12] R. Pawar, S. Ghumbre and R. Deshmukh, "Visual similarity using
convolution neural network over textual similarity in content- based
recommender system", International Journal of Advanced Science and
Technology, vol. 27, pp. 137-147, Sep. 2019.
[13] A. Yadav and D. K. Vishwakarma, "Investigating the Impact of Visual
Attention Models in Face Forgery Detection," 2023 International
Conference on Applied Intelligence and Sustainable Computing
(ICAISC), Dharwad, India, 2023, pp. 1-7,
doi:10.1109/ICAISC58445.2023.10199338.
[14] P. Liang et al., "Face Detection Using YOLOX with Attention
Mechanisms," 2022 10th International Conference on Information
Systems and Computing Technology (ISCTech), Guilin, China, 2022,
pp. 457-462, doi: 10.1109/ISCTech58360.2022.00077.
7
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 14,2025 at 03:56:29 UTC from IEEE Xplore. Restrictions apply.