Advancing_Road_Safety_A_CNN-Based_Semi-Supervised_Learning_Approach_for_Drivers_Drowsiness_Detection
Advancing_Road_Safety_A_CNN-Based_Semi-Supervised_Learning_Approach_for_Drivers_Drowsiness_Detection
Abstract—The National Safety Council (NSC) has reported triggered by various factors such as inadequate sleep, sleep
that each year, driver fatigue is responsible for 100,000 accidents, disorders, medication side effects, and prolonged wakefulness.
71,000 injuries, and 1,550 fatalities, often manifesting as drowsi- Drowsiness is characterized by abnormal sleepiness or fatigue,
ness. Most current systems fail to anticipate accidents beforehand,
focusing primarily on external factors. This paper introduces particularly during the day. However, it may also occur at
a novel drowsiness detection framework aimed at reducing inappropriate times such as while driving, operating machin-
accidents caused by drivers dozing off behind the wheel and min- ery, performing office work [3], or engaging in physically
imizing harm to individuals engaged in prolonged computer use. demanding tasks [4]. Consequently, drowsy drivers exhibit
The proposed method leverages a Convolutional Neural Network impaired reaction times, decreased attention, reduced decision-
(CNN) with a semi-supervised learning technique, setting it apart
from existing approaches. The effectiveness of our approach is making abilities, and an increased likelihood of nodding off
evaluated using the UTA-RLDD video dataset, and the results behind the wheel, significantly elevating the risk of road
demonstrate that the proposed method outperforms state-of-the- accidents, injuries, and fatalities. This poses a serious risk to
art methods, achieving 99.98% accuracy. both the individuals experiencing it and those around them.
Index Terms—Fatigue, drowsiness detection, CNN, semi- Studies have shown that drowsy driving accounts for a substan-
supervised learning.
tial number of accidents globally [5]. In response, detecting
I. I NTRODUCTION and mitigating driver drowsiness has become paramount of
importance area of research. This has led to the emergence
Driver fatigue is a major global issue that contributes of three primary technologies and strategies for drowsiness
significantly to road accidents. The World Health Organization detection.
reported 1.19 million traffic deaths in 2021 [1]. In addition,
many studies have projected more than 320,000 drowsy driv- • Physiological Methods: These involve direct measure-
ing accidents annually, causing about 6,400 fatal crashes. Data ments of bodily changes, such as electroencephalograms
from the National Traffic Safety Administration (NHTSA) (EEG) [6], electrocardiograms (ECG), electrooculograms
show a yearly increase in fatalities from drowsy driving. (EOG) [7], and respiration rates. These methods offer
Fatigue, often classified into sleep-related and task-related cat- good accuracy and can be used in real-time to detect
egories, is a major contributor to road accidents worldwide [2]. sleepiness but are not widely adopted due to their com-
Fatigued driving is particularly prevalent among international plex nature [8].
truck drivers, night shift workers, young men, individuals • Vehicle-Based Methods: These methods analyze param-
with untreated sleep disorders, and those with irregular sleep eters such as steering wheel movement [9], vehicle speed,
patterns. lane deviation, steering angle, brake speed, lateral dis-
Drowsiness thus represents a state of reduced wakefulness placement, and acceleration. Recent advancements have
and alertness that can significantly impair an individual’s integrated intelligent algorithms into vehicles to monitor
ability to perform tasks effectively and safely. It can also be both the vehicle and driver, enhancing driving safety [10].
979-8-3503-8848-0/24/$31.00 ©2024 IEEE • Behavioral Methods: These methods focus on analyz-
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.
ing facial and head movements to identify indicators II describes the proposed model for drowsiness detection,
of drowsiness, utilizing machine learning and computer including the YOLOv8-based face detection and CNN with
vision techniques. Key indicators include the Eye Aspect semi-supervised learning for feature extraction and classifica-
Ratio (EAR), yawning frequency, Percentage of Eyelid tion. Evaluation metrics and experimental results are presented
Closure (PERCLOS), and head movement patterns [11]– and discussed in Section III. Finally, Section IV concludes the
[12]. paper and offers perspectives for future work.
While physiological methods, such as heart rate and EEG II. P ROPOSED FRAMEWORK
signal monitoring, and vehicle-based methods, which analyze
driving patterns, have been used to detect driver fatigue, Towards Intelligent traffic, this research proposes a novel
behavioral methods have gained prominence due to their framework for advancing road safety. The proposed frame-
accuracy and reliability. These methods focus on observing work combines computer vision and semi-supervised learning
and analyzing a driver’s facial expressions and eye movements, techniques for driver drowsiness detection. Our framework
providing direct and immediate indicators of drowsiness. focuses on monitoring and analysing facial cues in real-time
Behavioral methods are particularly effective because they to improve not only the accuracy of the model but also the
analyze changes in facial features and eye dynamics, such as response time by leveraging a CNN for feature extraction and
the Eye Aspect Ratio (EAR), yawning, and head movement. a semi-supervised learning technique for robust classification,
These indicators are closely associated with drowsiness and see Figure 1.
can be detected in real-time using advanced computer vision The system begins with face detection & preprocessing,
techniques. Recent advancements in deep learning have further followed by extracting drowsiness-related features & classi-
enhanced these methods by improving the accuracy of facial fication using a customized CNN model. Finally, the output
feature analysis and enabling more robust and timely detection from the CNN model is fed into a semi-supervised learning
of driver fatigue. algorithm that leverages both labeled and unlabeled data to im-
In the realm of feature extraction, several studies have made prove the model’s generalization and performance, ultimately
significant contributions. For instance, [13] and [14] employ achieving an accurate drowsiness detection level.
CNNs and Haar-cascade techniques, respectively, for facial A. Face detection & preprocessing
feature extraction and drowsiness detection, achieving high
accuracy rates. Temporal data analysis is also crucial for cap- This stage aims to prepare the data for further processing.
turing patterns over time. Studies by [15], [16], and [17] utilize It consists of performing three main steps over the inputted
various LSTM-based networks to analyze sequential facial images: converting videos to images, cropping face images,
and blink features, demonstrating effective fatigue detection. and resizing face images.
[18] further contributes to this theme with an IndRNN for The first step involves converting input videos into a stream
classifying temporal blink sequences. of video frames (30 frames per second) outputted in a direc-
Regarding classification techniques, [19] compares KNN tory, followed by detecting drivers’ faces using YOLOv8 [22],
and multiclass SVM for fatigue detection, highlighting the a model known for its precision in detection and segmentation
impact of classification techniques on performance. Recent ad- in real-time. Additionally, the model isolates areas of interest
vancements in deep learning techniques, such as Vision Trans- within the frames, such as facial features, and detects corre-
formers and YOLOv5, have shown significant improvements sponding landmarks of the eyes and face. To standardize the
in drowsiness detection, as demonstrated by [20]. Finally, a input images for the Transformer and reduce computational
study focusing on real-time facial movement monitoring [21] costs, all cropped faces are resized to a resolution of 224×224.
highlights the effectiveness of combining feature extraction B. Feature extraction and classification
and advanced machine learning techniques for drowsiness pre-
diction. By extracting derivative features and training various After passing the input data through the data preparation
models, this study achieved the highest R-squared value of and preprocessing pipeline, it is ready to be fed into our CNN
93.65% with a decision tree model. model for feature extraction and classification. The model is
structured to efficiently process the data by extracting relevant
This research aims to develop a new vision-based semi-
patterns and minimizing computational overhead, ultimately
supervised framework for detecting driver drowsiness, leverag-
classifying the data as its primary function.
ing advanced machine Learning & deep-learning capabilities.
The CNN architecture consists of four convolutional layers
This paper advocates the following three contributions:
for feature extraction. In the first layer, a 3x3 convolution with
• A customized implementation of YOLOv8 for accurate a stride of 1 and padding of 1 is applied, generating 512 output
face detection. channels. Batch normalization is then performed to maintain
• A robust binary image classification model using CNN stable learning, followed by the ReLU activation function to
with semi-supervised learning for drowsiness detection. introduce non-linearity and mitigate the vanishing gradient
• A real-time implementation for driver drowsiness detec- problem. A max-pooling layer with a 2x2 kernel and stride of
tion. 2 reduces the spatial dimensions for computational efficiency.
The remainder of this paper is organized as follows: Section Unique to this architecture is the inclusion of a channel
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. The overall framework of the proposed method for drowsiness detection.
attention mechanism after the first convolutional layer, which probability is chosen as the predicted class. During training,
enhances the model’s focus on critical features at the early the model minimizes the cross-entropy loss, which quantifies
stages. The attention mechanism adaptively assigns weights the difference between the predicted probabilities and the true
to each channel, prioritizing the most relevant feature maps. labels, guiding the model to improve its classification accuracy.
This mechanism can be formulated as: Figure 2 shows the flow of data through the convolutional,
pooling, and fully connected layers. This figure provides a
A = σ (W2 (ReLU (W1 (AvgPool(X))))) , (1) visual representation of the key components, including the
Channel Attention mechanism, and highlights how the dif-
where X is the input feature map, W1 and W2 are learnable ferent stages work together to perform feature extraction and
weight matrices, and σ is the sigmoid activation function. classification.
The output A is a set of attention weights that scale the
channels, allowing the model to focus on important features. C. Semi-Supervised Learning
This attention mechanism balances complexity and efficiency,
The semi-supervised learning phase [23] in our research
enabling robust feature extraction and classification in tasks
involves leveraging a small amount of labeled data combined
such as driver drowsiness detection.
with a large amount of unlabeled data. Initially, the dataset is
Subsequent layers progressively reduce the output channels divided into labeled subsets (training, validation, and testing)
from 512 to 64, each convolutional layer followed by batch and an unlabeled subset. The model is trained on the labeled
normalization, ReLU activation, and max-pooling. This hier- data, while predictions on the unlabeled data, filtered by a
archical design allows the model to refine feature maps as confidence threshold, are iteratively added to the training
it deepens, improving both accuracy and computational effi- set. This process continues until the model’s performance
ciency. After feature extraction, the model applies an adaptive stabilizes, resulting in a more robust model that benefits
average pooling layer to standardize the output size, followed from the additional pseudo-labeled data. Figure 3 shows the
by two fully connected layers. The first fully connected layer proportional split of the studied dataset.
has 1024 neurons and includes a dropout layer with a 20%
rate to prevent overfitting. The second fully connected layer III. R ESULTS & D ISCUSSIONS
reduces the dimensionality to 512 neurons. Finally, a Softmax
layer outputs probability distributions for binary classification, The proposed approach is evaluated and validated on the
using cross-entropy loss to determine the predicted class. The UTA-RLDD dataset, where its performance is compared with
Softmax function transforms the raw output scores (logits) other drowsiness detection models. The experiment is con-
into probabilities, where each class’s probability indicates the ducted under the same conditions, using identical evaluation
model’s confidence in that class. The class with the higher metrics and protocols to ensure a fair comparison.
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Proposed architecture Drowsiness Detection.
The UTA-RLDD dataset [16] [24], is a comprehensive model performs predictions within a dataset. Accuracy is
collection of approximately 30 hours of RGB video footage determined through a specific mathematical formula:
from 60 healthy individuals. The dataset includes three video
recordings for each participant, corresponding to three distinct (T P + T N )
Accuracy = (2)
states: alertness, low vigilance, and drowsiness, resulting in (T P + F P + T N + F N )
180 videos. Each recording spans 10 minutes at 30 frames per 2) Precision: Precision, defined as the ratio of accurately
second. The dataset comprises a diverse group of participants, predicted positive observations to the total predicted positive
primarily undergraduate or graduate students aged 18, with observations, serves as an indicator of the model’s effective-
51 men and 9 women from various cultural backgrounds. ness in positive classification. Precision is calculated as an
The videos are in Full HD (1080x1920) resolution, and the equation:
classification task focuses on distinguishing between alert, low
vigilant, and drowsy states. Figure 4 provides an example from TP
P recision = (3)
the UTA-RLDD dataset. TP + FP
3) Recall: Recall measures the ratio of correctly predicted
A. Evaluation metrics positive observations to all positive observations. Recall quan-
In this part, we assess the performance of our model using tifies the accuracy of correct predictions across all positive
different criteria. These criteria are accuracy, precision, recall, instances, and it can be calculated using an equation:
and F1-score, which are defined as follows:
TP
Recall = (4)
1) Accuracy: When developing a classification model, it TP + FN
is essential to comprehend its accuracy. Generally, the ratio In the above equations, False Positive (FP), True Positive
of accurately predicted observations to the total observations. (TP), False Negative (FN), True Negative (TN), and True
Measuring accuracy enables the assessment of how well a Negative (TN).
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.
4) F1 score: The F1 score is the weighted average of which utilizes both labeled and unlabeled data. The integra-
precision and recall. As a result, it is more helpful in assessing tion of SSL not only accelerates training but also enhances
the model’s precision and recall. F1 is calculated as follows: feature extraction, making the model well-suited for real-time
applications like driver drowsiness detection.
2 × P recision × Recall
F1 = (5)
P recision + Recall TABLE II
B. experimental setup M ODEL P ERFORMANCE OVER VARYING E POCHS .
The system is implemented using the PyTorch framework AUC Recall Precision F1 Score Accuracy
5 epochs 99.99 99.96 99.43 99.69 99.67
on a high-performance workstation (HP Z8 G4) to evaluate its 10 epochs 99.98 99.99 99.54 99.77 99.75
performance. The system is equipped with an Intel Xeon Silver 20 epochs 100 99.96 100 99.98 99.98
4108 processor, a GeForce RTX 2080 Ti graphics processing
unit (GPU), and 32 GB of RAM. This robust hardware As shown in Figure 5, the model correctly predicted 3,854
configuration facilitates efficient training and inference of our negative instances (true negatives) and 4,503 positive instances
convolutional neural network (CNN), ensuring the system (true positives), with only two false negatives and no false
can handle the computational demands of real-time video positives. This highlights the model’s exceptional accuracy and
processing and classification tasks. precision in classifying both classes with minimal errors.
C. Experimental results
Table I illustrates that the proposed approach achieves the
highest drowsiness detection accuracy of 99.98%, signifi-
cantly outperforming traditional methods such as HM-LSTM
(65.20%) and CNN+LSTM (55.00%). Other approaches,
including FaceNet+KNN (94.68%), LSTM-GRU (95.12%),
CNN-based models (96.80%, 96.00%), and ViT (97.40%),
demonstrate competitive performance, yet still fall short of
the proposed method’s accuracy.
TABLE I
D ROWSINESS DETECTION ACCURACY (%) USING THE UTA-RLDD
DATASET FOR VARIOUS APPROACHES .
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.
detection,” Neural Computing and Applications, vol. 33, pp. 6921–6937,
2021.
[7] X.-Q. Huo, W.-L. Zheng, and B.-L. Lu, “Driving fatigue detection with
fusion of eeg and forehead eog,” in 2016 international joint conference
on neural networks (IJCNN). IEEE, 2016, pp. 897–904.
[8] A. Chowdhury, R. Shankaran, M. Kavakli, and M. M. Haque, “Sensor
applications and physiological features in drivers’ drowsiness detection:
A review,” IEEE sensors Journal, vol. 18, no. 8, pp. 3055–3067, 2018.
[9] S. Arefnezhad, S. Samiee, A. Eichberger, and A. Nahvi, “Driver drowsi-
ness detection based on steering wheel data applying adaptive neuro-
fuzzy feature selection,” Sensors, vol. 19, no. 4, p. 943, 2019.
[10] Z. Li, L. Chen, J. Peng, and Y. Wu, “Automatic detection of driver
fatigue using driving operation information for transportation safety,”
Sensors, vol. 17, no. 6, p. 1212, 2017.
[11] S. Hachisuka, “Human and vehicle-driver drowsiness detection by facial
expression,” in 2013 International Conference on Biometrics and Kansei
Engineering. IEEE, 2013, pp. 320–326.
[12] M. Karchani, A. Mazloumi, G. N. Saraji, F. Gharagozlou, A. Nahvi,
K. S. Haghighi, B. M. Abadi, and A. R. Foroshani, “Presenting a model
for dynamic facial expression changes in detecting drivers’ drowsiness,”
Fig. 6. The ROCs for the driver drowsiness detection. Figures in parentheses Electronic physician, vol. 7, no. 2, p. 1073, 2015.
indicate the area under curves (AUCs). [13] I. Nasri, M. Karrouchi, H. Snoussi, K. Kassmi, and A. Messaoudi,
“Detection and prediction of driver drowsiness for the prevention of
road accidents using deep neural networks techniques,” in WITS 2020:
Proceedings of the 6th International Conference on Wireless Technolo-
This paper introduces a novel framework that combines gies, Embedded, and Intelligent Systems. Springer, 2022, pp. 57–64.
CNN with Semi-supervised learning for drowsiness detection. [14] R. Tamanani, R. Muresan, and A. Al-Dweik, “Estimation of driver
It consists of three stages. The first stage uses YOLOv8 vigilance status using real-time facial expression and deep learning,”
IEEE Sensors Letters, vol. 5, no. 5, pp. 1–4, 2021.
for face detection. The second stage uses CNN for feature [15] Z. Huang, W. Tang, Q. Tian, T. Huang, and J. Li, “Air traffic controller
extraction, followed by integrating a semi-supervised learning fatigue detection based on facial and vocal features using long short-term
technique to improve classification accuracy, which leverages memory,” IEEE Access, 2024.
[16] R. Ghoddoosian, M. Galib, and V. Athitsos, “A realistic dataset and
labeled and unlabeled data to enhance the model’s gener- baseline temporal model for early drowsiness detection,” in Proceedings
alization and robustness. The final stage consists of model of the ieee/cvf conference on computer vision and pattern recognition
evaluation using the UTA-RLDD dataset. The results show workshops, 2019, pp. 0–0.
[17] P. Liu, H.-L. Chi, X. Li, and J. Guo, “Effects of dataset characteristics
that the model achieves high performance, with an average ac- on the performance of fatigue detection for crane operators using hybrid
curacy of 99.98%, outperforming state-of-the-art approaches. deep neural networks,” Automation in Construction, vol. 132, p. 103901,
The developed system effectively detects driver drowsiness, 2021.
[18] A. Kurniawardhani, H. A. Rahman, and I. V. Paputungan, “Non-invasive
thereby enhancing safety. automatic drowsiness detection using independently recurrent neural
To further improve classification accuracy, we propose ex- network,” in 2024 21st International Joint Conference on Computer
Science and Software Engineering (JCSSE). IEEE, 2024, pp. 534–539.
panding the count of driver drowsiness detection categories [19] F. D. Adhinata, D. P. Rakhmadani, and D. Wijayanto, “Fatigue detection
and increasing the number of facial landmarks and compo- on face image using facenet algorithm and k-nearest neighbor classifier,”
nents. This expansion will facilitate a more comprehensive Journal of Information Systems Engineering and Business Intelligence,
vol. 7, no. 1, pp. 22–30, 2021.
analysis and understanding of the driver’s drowsiness status [20] G. Krishna, K. Supriya, and J. Vardhan, “Mr k,“vision transformers and
in future research. Our future work will focus on testing the yolov5 based driver drowsiness detection framework,” 2022.”
model on various datasets to validate its robustness under [21] S. Kandwal, S. N. Singh, and I. Fatima, “Deep learning approaches for
drowsiness prediction and detection,” in 2024 11th International Confer-
different conditions. ence on Computing for Sustainable Global Development (INDIACom).
IEEE, 2024, pp. 861–866.
R EFERENCES [22] M. Hussain, “Yolo-v1 to yolo-v8, the rise of yolo and its complementary
nature toward digital manufacturing and industrial defect detection,”
[1] W. H. Organization et al., “Global status report on road safety 2023. Machines, vol. 11, no. 7, p. 677, 2023.
2023,” Geneva, Switzerland, WHO, 2023. [23] X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning
[2] M. Malcangi, “Applying evolutionary methods for early prediction of using gaussian fields and harmonic functions,” in Proceedings of the
sleep onset,” Neural Computing and Applications, vol. 27, pp. 1165– 20th International conference on Machine learning (ICML-03), 2003,
1173, 2016. pp. 912–919.
[3] M. Kołodziej, P. Tarnowski, D. J. Sawicki, A. Majkowski, R. J. Rak, [24] The University of Texas at Arlington, “Real-Life Drowsiness
A. Bala, and A. Pluta, “Fatigue detection caused by office work with the Dataset (RLDD),” Data set, 2019. [Online]. Available: [https:
use of eog signal,” IEEE Sensors Journal, vol. 20, no. 24, pp. 15 213– //sites.google.com/view/utarldd/home]
15 223, 2020.
[4] P. Li, R. Meziane, M. J.-D. Otis, H. Ezzaidi, and P. Cardou, “A
smart safety helmet using imu and eeg sensors for worker fatigue
detection,” in 2014 IEEE International Symposium on Robotic and
Sensors Environments (ROSE) Proceedings. IEEE, 2014, pp. 55–60.
[5] N. Ammour, H. Alhichri, Y. Bazi, B. Benjdira, N. Alajlan, and M. Zuair,
“Deep learning approach for car detection in uav imagery,” Remote
Sensing, vol. 9, no. 4, p. 312, 2017.
[6] S. Khessiba, A. G. Blaiech, K. Ben Khalifa, A. Ben Abdallah, and
M. H. Bedoui, “Innovative deep learning models for eeg-based vigilance
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 19,2025 at 18:17:45 UTC from IEEE Xplore. Restrictions apply.