0% found this document useful (0 votes)
163 views

Natural Language Processing and ML Based Student Mental Health Analysis Using Non Clinical Texts PDF

This document describes a student project that aims to analyze student mental health using natural language processing and machine learning techniques applied to non-clinical texts. The project was completed by five students to fulfill their bachelor's degree requirements in computer science and engineering at Bangladesh University of Business and Technology. It includes a literature review on previous related work, a proposed methodology involving data preprocessing, natural language processing algorithms, and multiclass classification models. The goal of the project was to predict student mental health conditions early through analysis of written texts to help counselors and guardians intervene sooner.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views

Natural Language Processing and ML Based Student Mental Health Analysis Using Non Clinical Texts PDF

This document describes a student project that aims to analyze student mental health using natural language processing and machine learning techniques applied to non-clinical texts. The project was completed by five students to fulfill their bachelor's degree requirements in computer science and engineering at Bangladesh University of Business and Technology. It includes a literature review on previous related work, a proposed methodology involving data preprocessing, natural language processing algorithms, and multiclass classification models. The goal of the project was to predict student mental health conditions early through analysis of written texts to help counselors and guardians intervene sooner.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Natural language processing and ML based

Student Mental Health Analysis using non-


clinical texts
A Project Report

Submitted By,

Md. Asif Jahan (36, 16173103017)

MD. Sakib Hossain Alif (36, 16173103025)

Abu bakor Siddik Nayem (35, 16172103135)

Md. Abul Bashar (35, 16172103095)

Md. Emdadul Islam (35, 16172103105)

A Project/Thesis Submitted in Partial Fulfillment of the Requirements for


the Degree of Bachelor of Science in Computer Science of the
Bangladesh University of Business and Technology (BUBT)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BANGLADESH UNIVERSITY OF BUSINESS AND TECHNOLOGY (BUBT)

DHAKA-BANGLADESH

JULY, 2021

i
DECLARATION

I hereby declare that the project entitled “Natural language processing and ML based
Student Mental Health Analysis using non-clinical texts” submitted for the degree of
Bachelor of Science Engineering in Computer Science and Engineering in the faculty of
Computer Science and Engineering of Bangladesh University of Business and Technology
(BUBT), is our original work and that it contains no material which has been accepted for
the award to the candidates of any other degree or diploma, except where due reference is
made in the next of the project to the best of our knowledge, it contains no materials
previously published or written by any other person except where due reference is made in
this project.

Md. Asif Jahan


ID: 16173103017 Signature

MD. Sakib Hossain Alif


ID: 16173103025 Signature

Abu Bakor Siddik Nayem


ID: 16172103135 Signature

Md. Abul Bashar


ID: 16172103095 Signature

Md. Emdadul Islam


ID: 16172103105 Signature

ii
APPROVAL

This In-depth research paper “Natural language processing and ML based Student
Mental Health Analysis using non-clinical texts” report submitted by Md. Asif Jahan,
MD. Sakib Hossain Alif, Abu Bakor Siddik Nayem, Md. Abul Bashar, Md. Emdadul Islam,
are students of Department of Computer Science and Engineering, Bangladesh University
of Business and Technology (BUBT), under the guidance of Dr. Muhammad Firoz Mridha,
Chairman and Associate Professor, and under the Careful supervision of Nahid Anwar,
Assistant professor, Department of Computer Science and Engineering, Bangladesh
University of Business and Technology has been accepted as satisfying for the sketchy
requirements for the degree of Bachelor of Science Engineering in Computer Science and
Engineering.

Nahid Anwar Dr. Muhammad Firoz Mridha


Assistant Professor Chairman and Associate Professor
Department of CSE, Department of CSE,
BUBT BUBT

iii
ABSTRACT

Recent years was golden years for machine learning revolution. Many researchers publish
many research. They contribute Great deal to their nation and to the world.

As a researcher we take a very interesting project which can help out young stars. If any
mental illness is caught by 1 or 2 years prior then there are 85% chance that individual will
overcome his or her state and can lead a very happy and healthy life. Our project will give
a early warning to the guardians or the counselors. We showed in our research that how we
can predict mental health using NLP and classification algorithms.

We found some good accuracy and precision in our model. So, the focus of our project is
to get the mental condition of a student before it is too late.

iv
ACKNOWLEDGEMENT

This whole time was just a journey to the achieving the final outcome of this project. Many
good things happened to us because of this project. We like to thanks all the great soul who
helps us a lot. All mighty Allah who keeps us in his mercy, in his supervision, we really
grateful to him. Our family is the second most important thing for our life. We would like
to our mothers who keeps believing in us. We would like to thank our friend Ratul Ahmed
who helped us directly and indirectly a lot.

We thank our supervisor from the bottom of our heart, Nahid Anwar, Assistant professor,
Department of CSE, Bangladesh University of Business and Technology, who gives us his
talented guidelines all the time to keep us motivated. He is a great supervisor and we were
very lucky to have him as our supervisor. This work is completed because of his sharped
guidance and encouragement. We owe him a lot. We give him the most sincerity from our
heart.

Our department is the home for us to get any kind of help if we need. We really grateful to
our department for all the help. All the BUBT students who participated into this research
is what we are grateful for. We thank all the person who helped us. Respect for everyone.

v
TABLE OF CONTENTS

DECLARATION ................................................................................................................ ii

APPROVAL ...................................................................................................................... iii

ABSTRACT....................................................................................................................... iv

ACKNOWLEDGEMENT .................................................................................................. v

List of Figures .................................................................................................................... ix

List of Tables ...................................................................................................................... x

LIST OF ABBREVIATIONS ............................................................................................ xi

CHAPTER 1: INTRODUCTION ....................................................................................... 1

1.1 Introduction ............................................................................................................... 1

1.2 Problem Statement .................................................................................................... 2

1.3 Problem Background ................................................................................................. 2

1.4 Research Objectives .................................................................................................. 2

1.5 Motivations................................................................................................................ 2

1.6 Flow of Research ....................................................................................................... 3

1.7 Significance of the Research ..................................................................................... 3

1.8 Research Contribution ............................................................................................... 4

1.9 Thesis Organization................................................................................................... 4

1.10 Summary ................................................................................................................. 4

CHAPTER 2: BACKGROUND ......................................................................................... 5

2.1 Introduction ............................................................................................................... 5

2.2 Literature Review ...................................................................................................... 5

2.3 Problem Analysis .................................................................................................... 11

2.4 Summary ................................................................................................................. 11

vi
CHAPTER 3: PROPOSED MODEL ............................................................................... 13

3.1 Introduction ............................................................................................................. 13

3.2 Feasibility Analysis ................................................................................................. 13

3.3 Requirement Analysis ............................................................................................. 13

3.4 Research Methodology ............................................................................................ 13

3.4.1 Data Preprocessing ........................................................................................... 15


3.4.2 NLP algorithms................................................................................................. 15
3.4.3 Multiclass Classification Algorithms ............................................................... 18
3.4.3.5 Word2Vec ......................................................................................................... 20

3.4.3.6 LSTM ................................................................................................................ 20

3.5 Design, Implementation and Simulation ................................................................. 21

3.6 Summary: ................................................................................................................ 21

CHAPTER 4: IMPLEMENTATION, TESTING AND RESULT ANALYSIS .............. 22

4.1 Introduction ............................................................................................................. 22

4.2 Dataset ..................................................................................................................... 22

4.3 System Setup ........................................................................................................... 25

4.4 Evaluation................................................................................................................ 25

4.5 Result and Discussion ............................................................................................. 26

4.6 Summary ................................................................................................................. 29

CHAPTER 5: STANDARDS, IMPACTS, ETHICS AND CHALLENGES ................... 31

5.1 Sustainability ........................................................................................................... 31

5.2 Impacts on Society .................................................................................................. 31

5.3 Ethics ....................................................................................................................... 32

5.4 Challenges ............................................................................................................... 32

5.5 Summary ................................................................................................................. 32

CHAPTER 6: CONSTRAINTS AND ALTERNATIVES ............................................... 34

vii
6.1 Design Constraints .................................................................................................. 34

6.2 Component Constraints ........................................................................................... 34

6.3 Budget Constraints .................................................................................................. 34

6.4 Summary ................................................................................................................. 34

CHAPTER 7: SCHEDULE, TASKS AND MILESTONES ............................................ 35

7.1 Timeline .................................................................................................................. 35

7.2 Gantt Chart .............................................................................................................. 35

CHAPTER 8: CONCLUSION ......................................................................................... 37

8.2 Future Work and Limitations .................................................................................. 37

References ......................................................................................................................... 38

APPENDICES .................................................................................................................. 42

viii
List of Figures

Figure 1.1: Work Flow for this Research............................................................................ 3


Figure 3.1: Support Vector Machine ................................................................................ 16
Figure 3.2: K-Nearest Neighbor ....................................................................................... 18
Figure 3.3: Decision Tree ................................................................................................. 19
Figure 3.4: Random Forest ............................................................................................... 20
Figure 3.5: Design of our model ....................................................................................... 21
Figure 4.1: Open-ended Questions ................................................................................... 23
Figure 4.2: Replace Dictionary ......................................................................................... 24
Figure 4.2: Before performing the replace operation........................................................ 24
Figure 4.3: After preprocessing ........................................................................................ 25

ix
List of Tables

Table 4.1: SVM Classifier (LinearSVC) For NLP ........................................................... 26


Table 4.2: Naive Bayes (MultinomialNB) For NLP ......................................................... 27
Table 4.3: Naive Bayes (BernoulliNB) For NLP.............................................................. 27
Table 4.4: Logistic Regression for Multi-Class Classification ......................................... 27
Table 4.5: KNN for Multi-Class Classification ................................................................ 28
Table 4.6: Decision Tree for Multi-Class Classification .................................................. 28
Table 4.7: Random Forest for Multi-Class Classification ................................................ 28
Table 4.8: Accuracy Comparison for NLP ....................................................................... 29
Table 4.9: Accuracy Comparison for Multi-Class Classification ..................................... 30

x
LIST OF ABBREVIATIONS

KNN k-Nearest Neighbor


ANN Artificial Neural Network
DL Deep Learning
CNN Convolutional Neural Network
RNN Recurrent Neural Network
SVM Support Vector Machine
SVC Support Vector Classifier
LSTM Long Short-Term Memory

xi
CHAPTER 1: INTRODUCTION

1.1 Introduction

Mental health includes our emotional, psychological and social well-being. Our way of
thinking, feeling towards our life totally depend on this phenomenon. This also determines
how our brain response to stress and hardship in life. Mental health is very important for
every stage of life most importantly in student life. Because we build our character and
future in those years. If we don’t feel we in mentally then we can face different problem in
our life. Elena et el. [1] 21% students were in prevalence of depression and suicidal life.
They perform their research on undergraduate student. They also found that if these
students are diagnosed early then the students can have a higher chance to prevent this kind
of destructive behavior. Harmeet et el. [2] the prevalence of mental health disorder is high
among the undergraduate students of USA. They also state that “There is an urgent need
to develop strategies for early screening and management of mental health services in
university settings”. They found that more than 22% students were bothered by depression,
8% students by posttraumatic stress disorder. We tried to build a system for the
management to know if any student is facing any kind of mental issue. We tried to do this
using both NLP (Natural Language Processing) and classification algorithm. This NLP
based work was never conducted before by any researcher. By the help of our supervisor
and the researcher mate we believe we will make is project a great for our machine learning
community. We worked collaborate for this project for about a year. We have seen that
naïve bayes and support vector machines like algorithm does better in NLP. Finally, we
will use multiclass classification algorithm like Logistic regression, KNN, Decision Tree
Classification, Random Forest Classification. Our data set is optimized for this kind of
model where data will be the answer of both open-ended questions and close-ended
questions. Student mental health analysis from clinical text is a very important work to
build a better society. For this purpose, we will study existing paper on this topic or related
to this topic and we will make a better model that will perform rich performance for this
work.

1
1.2 Problem Statement
To take a supportive action for mentally bothered student is very challenging because we
don’t have enough time to communicate with each student. So many students fall behind
the curtain of helplessness. To overcome this challenge, we build a specialized model based
on our survey dataset. Though it is a very challenging work cause to identify a mental
health problem psychiatrist also take over months. But we tried to get the internal feelings
of a student by our special questionnaire, which was determined by well-known doctor on
this particular field. Our model will predict if a student is having a mental illness thus the
condition of mental health based on some specialized questionnaire.

1.3 Problem Background

Currently researcher worked on student’s family background to find any kind of mental
illness in that particular student. But no research has been done if a student can answer
some question and they can know that they need special attention or not. And there is not
a single dataset that can meet this kind of criteria. To build a dataset based on students’
proper response about their life and view point is very informative and helpful. This work
will be very contributing work to both management and student society.

1.4 Research Objectives

The main objective for our research is as follow-

• Studying the mental health issues and classification problems


• Building a Dataset by surveying undergraduate students
• Using NLP for Open-ended questions
• Using multiclass classification for the whole dataset
• Comparing the accuracy of all the algorithms

1.5 Motivations

To build a better nation we have to sharpen the student’s character and their education
background properly. To do this we have to make sure our students are mentally fit and

2
hopeful about their dreams. But most of the undergraduate students are facing many mental
issues like depression, stress, anxiety etc. This causes many problems to their future life
after completing the graduation. To help these students we took this kind of research to
find a better solution for our students. Currently the NLP sector is evolving and there are
some robust algorithms to do this kind of research properly. We will use both NLP and
multiclass classification to solve our challenging work.

1.6 Flow of Research


Any research work is based on many small steps combine together. Like them our first step
is to find the related research on this field. Then we tried to understand how we can take
the advantage of NLP and classification algorithms in our work. After that we made our
customized dataset from undergraduate students. Our Work flow is illustrated in the
following diagram.

Figure 1.1: Work Flow for this Research

1.7 Significance of the Research

By doing some findings about this kind of work we found that there was not
any kind of implementation about this topic in google scholar. Researcher use
only NLP or Classification to solve this type of work. And their dataset was

3
not related to student’s opinion. Our work will help the management of any
institution to point the any kind of mental condition of any student is currently
having. So that they can take counselling and other program to cheer these
students.

1.8 Research Contribution

The contribution of our work will be-

• Identifying the mental issues that are creating the problem


• Create a new Dataset using both open-ended questions and close-ended questions.
• Building a combine model of NLP and multiclass classification algorithm to solve
this problem
• Find the accuracy of various algorithms

1.9 Thesis Organization


This thesis paper is organized in different chapter. Chapter 1 described the problem
introduction. Chapter 2 will describe the various literature review we have to understand
our problem more accurately and learn the current work of algorithms. Chapter 3 will
contain the models description. How this model is created and how it works. Chapter will
tell the evaluation part of the model on the dataset with the proposed machine learning
model. Chapter 5 will explain the impacts on society and the standards, ethical policy, and
challenges of the proposed model. Chapter 6 will describe the overall design and
implementation constraints of our conducted thesis work. Chapter 7 contains the time
schedules of the thesis. Finally, Chapter 8 is the overall conclusion of the thesis work.

1.10 Summary

This chapter gives a very brief idea about our work and why we choose this wok to do.
This chapter also shows the overall workflow of this thesis work and how this will be done
in the allocated time for this project.

4
CHAPTER 2: BACKGROUND

2.1 Introduction

Mental health is a very important topic for researcher from the very beginning of the time.
Many researchers showed that students who are mentally sick are linked to their life style
and their family background. This NLP based work was never conducted before by any
researcher. By the help of our supervisor and the researcher mate we believe we will make
is project a great for our machine learning community. We worked collaborate for this
project for about a year. We have seen that naïve bayes and support vector machines like
algorithm does better in NLP. Finally, we will use multiclass classification algorithm like
Logistic regression, KNN, Decision Tree Classification, Random Forest Classification.
Our data set is optimized for this kind of model where data will be the answer of both open-
ended questions and close-ended questions. We tried to find a relation between them by
our custom-made dataset and apply NLP and classification-based algorithm to find a better
and machine learning way to find those students who are mentally sick and need special
attention.

2.2 Literature Review

For this research we will try to understand some reason for students to become mentally
unstable. Then an NLP way to understand an emotion of a sentence. Finally, how
classification can determine a mentally unstable person from a mentally healthy person.

Islam et el. [3] researched the prevalence of depression, anxiety and stress with the DASS-
21 for the students of Bangladesh. They made their dataset from online surveying. There
were 3122 Bangladeshi university students who took a part in this survey. 76.1% showed
mild symptoms, 62.9% least moderate symptoms and 16.5% at least very severe symptoms.
Most participant were single about 81.9 percent were from nuclear families, while 68.3
percent were from metropolitan regions. 50.2 percent were from high-income households.
55.3 percent do not engage in any form of physical activity, 77.3 percent are content with
their sleep, 61.9 percent are dissatisfied with their academic pursuits while living under

5
COVID-19 limitations, and do not smoke cigarettes (81.2 percent). 33.9 percent of them
spent more than 6 hours on the internet.

Anowara et el. [4] investigated the socio-demographic, behavioral, and psychological


variables that impact students' mental health as a result of exams in Bangladesh. They
discovered that according to the social ecological theory, a person who lives in a
neighborhood with low socioeconomic status (SES) and little social support is likely to
have poorer health outcomes than someone who lives in a better area., In the current study
environment, socio-demographic characteristics have an impact on students' mental health.
The student's age, sex, location of residence, education, parents' education, family income,
personal income, relationship status, frequency of contact with family, and extra-curricular
activities are the most prevalent socio-demographic characteristics mentioned in the
literature. Their students ranged in age from 17 to 25 years old and were pursuing college
degrees, master's degrees, and high school diplomas. 0.84 (95 percent CI: 0.81 0.84), 0.80
(95 percent CI: 0.76 0.84), and 0.88 (95 percent CI: 0.85 0.90) for stress, anxiety, and
depression, respectively.

Mridha MK et el. [5], The incidence is greater among girls in most sociodemographic,
lifestyle, and anthropometric strata. A variety of sociodemographic and lifestyle variables
have been linked to depression. Their research was conducted in 82 randomly selected
clusters (57 rural, 15 non-slum urban, and 10 slums) throughout eight Bangladeshi
divisions. Their major end measure was "any depression," with additional outcomes
including "minimal, mild, moderate, moderately severe, and severe depression." They
discovered that 75.5 percent, 17.9 percent, 5.4%, 1.1 percent, and 0.1 percent of people had
minimum, mild, moderate, moderately severe, and severe depression, respectively. The
prevalence of any depression was greater among teenage girls across major
sociodemographic, lifestyle, and anthropometric strata. Depression was linked to older age,
greater mother education, and father profession in both sexes. Only among boys was
depression linked to greater paternal education, such as full secondary and above (AOR:
1.42), and the lack of another sibling, a family with a teenage member non-slum urban
location, Muslim faith, and family size of 4 were also linked to depression among females.

6
Faisal et el. [6] COVID-19 concern predicted psychopathology symptoms among
Bangladeshi university students, indicating a significant prevalence of mental health
disorders. 40.2 percent of pupils reported moderate to severe anxiety symptoms (moderate
at 23.6 percent and critical at 16.6%), while 72.1 percent had depression symptoms that
were moderate to bad. Furthermore, path analysis revealed that anxiety and knowledge of
COVID-19 were associated with moderate to poor mental health; information and belief
about COVID-19's severity in Bangladesh were associated with depressive symptoms.
More over half of the individuals had a mental health condition that was moderate to bad
(53.9 percent). In terms of views, 34.3 percent disagreed with the statement "nothing to be
concerned about," and nearly two-thirds (62.5 percent) strongly disagreed with the
probability of a COVID-19 pandemic in Bangladesh. While it came to fear, 77.1 percent
felt uncomfortable when thinking about COVID-19's consequences, and 88.1 percent were
scared about the upcoming days. Only 9.5 percent of those polled said they had a complete
grasp of COVID-19. COVID-19 was known by 95.1 percent of the participants, and almost
three-quarters (77.1 percent) of them modified their hygiene habits as a result of it.

we have seen the major factor of mental illness among the students of our universities.
Now we will know about how to survey about student’s mental health.

Changwon Son et el. [7] the negative effects of pandemic situation on students are
increasing day by day. They survey 195 students and 71% of them showed increasing stress
and anxiety due to corona pandemic. Their researcher chooses the questionnaire carefully
to target the various aspect of student’s life. The interview each of them via Zoom. They
asked those participants about their life-style, health routine, academic performance. These
topics help them to understand the impact of pandemic situation on students deeply.

Velten et el. [8] how life-style can define a positive mental health. They took about 2,991
German and 12,405 Chinese student for one year follow up about their life-style. The life-
style choices were body mass index, mental activities, alcohol consumption, smoking, diet
and social rhythm irregularity. And they found that rich and healthy life-styled participant
had very positive mental state.

O'Dea B et el. [9] how a web-based monitoring can be done to help the counselors and
provide extra care for those students who needed. Their survey was completed by 145

7
school counselors. 82.1% thought this model can be very useful. They question about the
information about their life, health etc. They attached the survey question inside their work.

Abha et el. [10] NLP can be used in chat bots to response smartly by understanding natural
language using NLP techniques. They used techniques like word embeddings, sentiment
analysis, models like sequence-to-sequence model. They used naïve bayes classification
algorithm. They train their chat bot by the subtitles from different movie dialogues.

Rafaela et el. [11] in their paper they showed how NLP could be used in mental health
analysis project. They discussed about how data can be collected, then the preprocessing
or labeling can be done on those data. they showed can feature can be extracted from data
like demographic features, Lexical features, Behavioral features, social features.
Interventions can be done to those ill mental individuals.

Chilman et el. [12] NLP use to find the occupation of patients of healthcare text. They used
CRIS-derived dataset for the cycles of application development and evaluation. They
perform the steps like text pre-processing, occupation detection, occupation title
assignment, occupation relation classification and occupation filtering. The application had
a precision of 79% and a recall of 77% on gold-standard human-annotated clinical
literature.

Y. Mehta et el. [13] The majority of bottom-up, automated feature generation as part of the
deep learning process is used to forecast the state-of-the-art personality from text data. This
model is used to determine personality, attributes, and other characteristics. They utilized
an essay produced by students and annotated with binary labels of the Big Five personality
traits discovered using a conventional questionnaire to create the dataset. They utilized the
BERT, Albert, and Roberta models for the model. By 1.6 percent, their prediction model
outperformed the existing dataset.

M. F. Mridha et el. [14] NLP can be used to detect the mental condition of an individual.
They showed the basic difference between sentiment and mental condition. They collect
their Bengali dataset from the social media like Facebook, tweeter, WhatsApp, messenger
etc. Their dataset contains 2000 sentences where 814 were abnormal. To extract the feature
from data they used count-vector and TF-IDF technique. For the classification they used

8
Naïve bayes and Support vector machine. Their Accuracy were really high and F1 score
was above 88%.

Shikha Tiwari et al. [15] propose sentiment analysis on twitter dataset. They used
Morphological features, bag of words. For classification model they used Decision Tree,
SVM, Random Forrest. Their accuracy was Decision Tree 99.3%, SVM 91.6%, Random
Forest 99.4%. They found out that Decision Tree and Random Forest algorithm has more
accuracy than SVM.

Monika Arora et al. [16] presented a sentiment analysis system for unstructured twitter
data. They used character level embedding. Text normalization was accomplished using a
convolutional neural network. Deep learning outperforms SVM and conventional design
with more hidden layers to extract language labels and less parameters to train the CNN
architecture, according to their findings.

Graterol et el. [17] show how robot can detect emotion from the interaction with humans.
They used different types of models like RNN, LSTM, GRU. They first capture the video
or audio then they convert the audio into text and finally do the NLP based operations.

Kim et el. [18] Analyze the depressive emotion using NLP. They took about 2200 SNS
post. Then they marked the specific keyword that highlight the emotion of a user. There
were 1297 positive tweets. They used RNN, LSTM, GRU and GRU model had accuracy
of 92.2% and 2-4% higher than another model.

S. B. Hassan et el. [19] NLP was found helpful to recognize the suicidal intent in depressed
population. They used the multi-layer integrated google home mini, google dialogFlow
machine learning algorithm and Twilio API. They categorize the sentiment into ‘happy’,
‘neutral’, ‘depressive’, ‘suicidal’. They found that the diagFlow algorithm perform better
than traditional algorithm.

M. Y. Manohar et el. [20] NLP and CORPUS found useful to detect the sarcastic statement
on social media. They collect their dataset from tweeter. Then they perform the operation
to find the practical usefulness of above statement.

9
Aggarwal et el. [21] how classification algorithm can solve text classification problem.
They discussed about various types of classification algorithms like decision trees, bayes
methods, SVM, neural network etc.

Deepti et el. [22] How classification techniques such as decision trees, SVMs, and nave
bayes may be used to predict diabetes in the early stages of life. In comparison to other
algorithms, their testing revealed that nave bayes had the greatest accuracy of 76.30
percent.

Aziliz et el. [23] how to provide the first step towards developing a national dataset of
student counselling outcomes drawn from differing two clinical outcome measures and two
platforms were pooled and analyzed. Counselling was particularly effective for improving
depression, anxiety, wellbeing, hostility, social anxiety and academic distress. Results
demonstrate value in pooling complete data from HE counselling services and we argue
for developing a national dataset of university counselling data.

Yang N et el. [24] Machine learning and natural language processing models have been
hot topics in medicine in recent years, and they may represent a new paradigm in medical
research. However, rather than creating wholly new knowledge, these procedures tend to
corroborate clinical theories, and only one main segment of the population (i.e., social
media users) is imprecise cohort. Furthermore, several language-specific aspects can
increase NLP approaches effectiveness, and their extension to other languages should be
researched

further.

Zunic et el [25] The majority of the information was gathered through social media and
online commerce platforms. The majority of online discussions are intended to exchange
information and give social support. These communities tend to form around serious health
problems with a high prevalence of chronicity. Some of the treatments and services covered
include medication, vaccines, surgery, orthodontic services, particular physicians, and
health care services in general.

Chen et al [26], The global pandemic has broken out. It posing serious threats to people's
lives, as well as to their mental health. Strict self-quarantine and prolonged school

10
suspension may also affect student’s mental health problems. Furthermore, pupils may get
disorganized with their regular assignments, thus disrupting their biological cycles. The
study was authorized by the relevant institutional research and ethical council after daily
checks of the gathered questionnaires. The RESE scale was created to measure self-
efficacy in coping with negative and positive emotions.

A paper in Wolny et al. [27] 2016 named "Emotion Analysis of Twitter Data That Use
Emoticons and Emoji Ideograms" they focus on this issue by analyzing symbols called
emotion tokens, including emotion symbols. They describe in their paper that the approach
to extending existing binary sentiment classification approaches using a multi-way
emotions classification.

Shikha Tiwari et al. [28] propose sentiment analysis on twitter dataset. They used
Morphological features, bag of words. For classification model they used Decision Tree,
SVM, Random Forrest. Their accuracy was 99.3 percent for Decision Tree, 91.6 percent
for SVM, and 99.4 percent for Random Forest. They discovered that the Decision Tree and
Random Forest algorithms are more accurate than the SVM method.

Ive et el. [29] They used cutting-edge text generation techniques to produce fake medical
data. With the aid of key words, they facilitate the creation of EHRs. These important
phrases are extracts from the original text that make sense.

We now know how to use our experiment done properly. We will build our model which
will get the predicted output we want.

2.3 Problem Analysis

To analyze the mental health of a student our projected model can be do a better job. We
have discussed how it can be done and what another researcher has been done. The method
we want to work with will be the first model to be developed by any researcher.

2.4 Summary

This chapter showed a details description about previous work related to our domain and
how they solved their problem. We got a better understanding now so that we can go to the

11
next chapter where we will determine which algorithms to use and how they can be
combined to build a better model for our problem. We found the solution and a better idea
how to solve our problem we might face to complete our work. These papers really give
us the deep insight of our work and how to complete this project.

12
CHAPTER 3: PROPOSED MODEL

3.1 Introduction

In this chapter we will look into more in depth subjects of our work. The main part of this
chapter is to know how we build our models and what algorithms we choose for our model.
To get a optimized algorithm we will take three or more algorithm and see which algorithm
performs best. Our main purpose is to optimize our model so that it can gain the optimal
solution within short period of time. We will discuss various methods and they’re behind
the calculation part. How they work and what are the basics of that particular algorithms.
Our proposed model will take 2 kind of algorithm one for NLP based solution and second
one is for Classification problem. We will analyze the feasibility, requirements. Then we
will discuss in-details the research methodology, design, implementation and simulation.

3.2 Feasibility Analysis

Five researchers worked together to solve this work. One supervisor monitors the progress
and timeline. It took about 11 months to complete the total work. In the mean time we
collected the dataset, then perform the operations. Financial support was not necessary for
this work.

3.3 Requirement Analysis

The requirements are given below-

• A computer with medium performance rate


• Software for doing the operations

3.4 Research Methodology

The methodology will be in-details described in this section. We will know how our model
work and predict the output and how each algorithm work individually.

13
First, we want to analyze the sentiment of the open-ended question. There are 4 open-ended
questions.

• Tell us about your sleeping habits over the past 12 months. Have you noticed any
changes? Difficulty sleeping? Restlessness? (Answer in 2 or 3 sentences)
• How would you describe your appetite over the past 4 weeks? Have your eating
habits altered in any way? (Answer in 2 or 3 sentences)
• Could you tell us about any times over the past few months that you’ve been
bothered by low feelings, stress, or sadness? (Answer in 2 or 3 sentences)
• Tell us about how confident you have been feeling in your capabilities recently?
(Answer in 2 or 3 sentences)

The rest 13 Close-ended questions are_

• How frequently have you had little pleasure or interest in the activities you usually
enjoy?
• How often during the past 6 months have you felt as though your moods, or your
life, were under your control?
• How frequently have you been bothered by not being able to stop your worrying?
• How often have you felt as though the future was bleak, over the past few weeks?
• How much confidence do you have that you can successfully?
• I see myself as a good person_
• I feel positive about my relationships with others and my interpersonal
connections_
• I have others around me who support me_
• How do I feel most of the time?
• I have a positive outlook on my life_
• Are you having any extreme emotions or mood swings? Any___
• Write your CGPA

We will build our model such way that it will first analyze the sentiment of each respondent
by those open-ended questions using NLP. Then the output of sentiment Analyzer will be
use to predict the final prediction which will be in 5 categories.

14
• Happy
• Neutral
• Depressive
• Anxious
• Suicidal

3.4.1 Data Preprocessing

The data for our project came from a survey. There were 17 questions and 4 of them was
open-ended and rest of was close-ended. For the open-ended question respondent’s
response their feelings towards that question in 2 or 3 sentences. For the close-ended
question they tick mark the suitable answer according to them.

As we said earlier, we will use NLP based classification algorithm for our open-ended
questions. To classify the sentiment properly we mark each of the individual’s response
into 3 categories, 1, 0, -1. 1 being positive, 0 being neutral and -1 being negative.

And the overall data was leveled into four categories as we said before. We checked if
there were any missing values or mis-information.

We take the dataset and perform label encoding to transform the categorical values to
numerical values to that it can fit the classifier algorithm properly.

3.4.2 NLP algorithms

Natural Language Processing is a technique to process the natural language using


machines. For our project we used linear support vector Classifier and some of naïve bayes
Classifiers.

3.4.2.1 TF-IDF

Firstly, we Clean the sentences using regex. Then we use the TF-IDF formula to calculate
the term frequency of features. The TF-IDF calculate as follows_

TF:

15
𝒇𝒕,𝒅
𝒕𝒇(𝒕, 𝒅) = … … … … … … . . (𝟏)
∑𝒕′∈𝒅 𝒇𝒕′ ,𝒅

IDF:

𝑵
𝒊𝒅𝒇(𝒕, 𝒅) = 𝒍𝒐𝒈 … … … … … … . . (𝟐)
|{𝒅 ∈ 𝑫: 𝒕 ∈ 𝒅}|

TF-IDF:

𝒕𝒇𝒊𝒅𝒇(𝒕, 𝒅, 𝑫) = 𝒕𝒇(𝒕, 𝒅) ∗ 𝒊𝒅𝒇(𝒕, 𝒅) … … … … … … . . (𝟑)

3.4.2.1 Support vector Classification

SVM is a machine learning algorithm that took the data as 2 or more categories then draw
a margin to separate them.

Figure 3.1: Support Vector Machine

16
Showed in the figure 2, the classifier work is to maximized the Margin two support vector
line. And the optimal hyperplane will be The Best outcome for any dataset’s weights. The
equation for SVM Hyperplane is given below.

Given a dataset D={(xi,yi)|xi∈Rn,yi∈{−1,1}}mi=1, we compute β for each training


example, and B is the smallest β we get.

B=mini=1...m|w⋅x+b|

If we have s hyperplanes, each of them will have a Bi value, and we’ll select the hyperplane
with the largest Bi value.

H=maxi=1...s{hi|Bi}

3.4.2.2 Naïve Bayes Classifier


Bayes, Nave the Bayes theorem is used with the naive assumption of conditional
independence between every pair of characteristics of a class variable in algorithms.

If a given class variable y and dependent vector from X1 to Xn

𝑷(𝒚)𝑷(𝒙𝟏 , … , 𝒙𝒏 |𝒚)
𝑷(𝒚|𝒙𝟏 , … , 𝒙𝒏 ) = … … … … … … . . (𝟒)
𝑷(𝒙𝟏 , … , 𝒙𝒏 )

Naïve conditional independence,

𝑷(𝒚) ∏𝒏𝒊=𝟏 𝑷(𝒙𝟏 , … , 𝒙𝒏 |𝒚)


𝑷(𝒙𝒊 |𝒚, 𝒙𝟏 , … , 𝒙𝒊−𝟏 , 𝒙𝒊+𝟏 , … , 𝒙𝒏 ) = … … . . . (𝟓)
𝑷(𝒙𝟏 , … , 𝒙𝒏 )

There are several types of naïve bayes variants. Gaussian nave bayes, Multinomial nave
bayes, complement nave bayes, Bernoulli nave bayes, and so on are some examples. The
most classic nave bayes method used in text classification is Multinomial, which is for
multinomially distributed data.

17
3.4.3 Multiclass Classification Algorithms
There are some multiclass classification algorithms that work on multi-level output
features. Like our dataset. We used the classification algorithm like Logistic regression,
KNN, Decision Tree, Random Forest.

3.4.3.1 Logistic Regression


The most used a regression-based classification algorithm is Logistic regression. We use
this classification to classify a dataset into 2 or more categories by using a threshold value.
If our output is upper than a certain value then the predicted value is output as positive
otherwise negative and vice versa. The equation for Logistic regression is_

𝒆(𝒃𝟎 +𝒃𝟏 ∗𝒙)


𝒚= … … … … … … . . (𝟔)
𝟏 + 𝒆(𝒃𝟎 +𝒃𝟏 ∗𝒙)

Where X is the dependent variable, y is the output variable, b0 is the coefficient and b1 is
the weight.

3.4.3.2 K-Nearest Neighbor


It simply measures the distance between two points into the plane and Check if ‘N’ number
values are in or not.

Figure 2.2: K-Nearest Neighbor

18
3.4.3.3 Decision Tree:
Decision tree is supervised learning technique. There internal nodes viewed
as the features and branches as decision rules, nodes are the outcome.

Figure 3.3: Decision Tree

19
3.4.3.4 Random Forest
Random Forest is also a supervised learning algorithm. It is used for both classification and
regression. Random forest makes decision tress randomly on selected data samples, take
prediction from each tree and take the best outcome by the averaging the voting. It deals a
pretty good feature

Figure 3.4: Random Forest

3.4.3.5 Word2Vec
Word2Vec is a very popular word embedding technique created by google in after 2000. It
just simply converts the words from a sentence into numerical values. Google have many
models for this technique.

3.4.3.6 LSTM
It is a artificial recurrent neural network architecture. It mainly works on sequential data.
Because previous data plays an important role in the final outcome. The main part of a
LSTM is cell which has three gate input gate, output gate, forget gate.

20
3.5 Design, Implementation and Simulation
The design, implementation and simulation will be given below

Figure 3.5: Design of our model

For the implementation part we used data science libraries like pandas, NumPy, scikit
learn.

3.6 Summary:
In this Chapter we showed how our model was built and each algorithm behind the scene
workflow. In the next chapter we will get the practical knowledge about out algorithms
performances. We choose our model based on our study about our work on previous work.
We found that some model works better than some other model. For our NLP based work
we took the latest approach word2vec with LSTM.

21
CHAPTER 4: IMPLEMENTATION, TESTING AND
RESULT ANALYSIS

4.1 Introduction
This specific chapter will describe the workflow of the proposed model. The model to
predict the student mental health using NLP and multiclass classification algorithms. To
get is model work we will use NLP based algorithms on open-ended questions and finally
multi-class classification for all other features. There were seventeen questions that we
asked to each of the students. Four of them were open-ended and rest of them were close-
ended. We have used the python programming language and open-source libraries like
pandas for data processing, sklearn for machine learning algorithms, matplotlib for
visualization. For the coding we used anaconda and Google platform colab. Dataset was
collected from surveying undergraduate students.

4.2 Dataset
By reviewing the previous research work we have learnt that mental issue depends on the
life-style of an individual person. To Analyze it a little bit more we tried to made our
custom dataset. Which was asking the participated students about 17 questions mainly
focusing on their recent life-style, sleeping routine, eating habit etc. 108 undergraduate
students participate into this research work. After getting the dataset we tried to categories
each person’s mental condition into 5 categories. The expert opinion of specialist was taken
here.

To solve the NLP based problem first we took the open-ended questions in one column.
Before that we marked each individual’s response as positive and negative opinion.

Here in figure 7 state the open-ended questions. “Open-ended” column is the combined
column for the open-ended questions all together. “Open_target” column is representing
the output for each response. 1 being positive response and -1 being the negative response.

22
Figure 4.1: Open-ended Questions

After that we took all the close-ended questions along with “Open_target” so that we can
perform the multiclass classification algorithm. To perform that operation, we convert all
the string type value to a numerical value. To do this thing we follow the hard encoding
operation. We write all the possible output for all string type input. Given figure is
representing that dictionary.

23
Figure 4.2: Replace Dictionary

Following figure represent the columns before preforming the replace operation.

Table 4.1: Before performing the replace operation

24
The next figure represents the after effect of all the data preprocessing.

Table 4.2: After preprocessing


To do the training part we use train test split method to split the dataset. We use 80% for
training and 20% for test or validation.

4.3 System Setup


For this specific task we choose python programming language which has a vast library for
machine learning things to do with. We used the NumPy, pandas, scikit learn libraries.
From the scikit learn library we used all our NLP and classification-based algorithms.

4.4 Evaluation
Our experiment was not very time consuming or hard to do. But the question about
accuracy was always a little bit more tricky part for all of us. The student mental health
dataset shows a great promising in our model’s performance.

To understand an algorithm’s performance, we create a confusion matrix which contains


the accuracy, precision and recall.

There are some small notations are used to define this measurement system. They are,

• True-positive (TP)
• False-Positive (FP)
• True-Negative (TN)
• False-Negative (FN)

25
How many correct predictions the model can predict is the measurement of accuracy for
any model. the formula_

𝑻𝑷 + 𝑻𝑵
𝐴𝒄𝒄𝒖𝒓𝒂𝒄𝒚 =
𝑻𝑷 + 𝑻𝑵 + 𝑭𝑷 + 𝑭𝑵

Precision will tell us the ratio between the positive prediction and the total number of
positive outcomes.

𝑻𝑷
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑻𝑷 + 𝑭𝑷

Recall is the ratio between the true positive and the true positive summation false negative.
The equation is,

𝑻𝑷
𝑹𝒆𝒄𝒂𝒍𝒍 =
𝑻𝑷 + 𝑭𝑵

F1 Score is the measurement of a test’s Accuracy, Formula as follow

𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝑹𝒆𝒄𝒂𝒍𝒍
𝑭𝟏 = 𝟐 ∗
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝑹𝒆𝒄𝒂𝒍𝒍

4.5 Result and Discussion


We have done the open-ended questions sentiment analysis with LinearSVC

And naïve bayes Algorithm. Each algorithms Precision, Recall, accuracy and F1 will be
given in the table below. In the following tables ‘1’ being Positive sentiment and ‘-1’ means
negative sentiment.

SVM classifier:

Table 4.3: SVM Classifier (LinearSVC) For NLP

LinearSVC Classifier
y Precision Recall F1-score Accuracy
-1 74% 100% 85% 77%
1 100% 38% 55%

26
Naïve Bayes Classifiers:

Table 4.4: Naive Bayes (MultinomialNB) For NLP

MultinominalNB Classifier
y Precision Recall F1-score Accuracy
Negative 64% 100% 78%
64%
Positive 100% 0% 0%

Table 4.5: Naive Bayes (BernoulliNB) For NLP

BernoulliNB Classifier
y Precision Recall F1-score Accuracy
Negative 75% 86% 80%
73%
Positive 67% 50% 57%

Table 4.6: LSTM For NLP

LSTM Classifier
y Precision Recall F1-score Accuracy
Negative 57% 100% 73%
73%
Positive 100% 57% 73%

Then the full dataset had gone under classification algorithms like Logistic Regression,
KNN, Decision Tree, Random Forest.

Table 4.7: Logistic Regression for Multi-Class Classification

LogisticRegression Classifier
y Precision Recall F1-score Accuracy
Anxious 67% 33% 50%
Depressive 50% 50% 80%
Happy 75% 100% 80% 68%

27
Neutral 67% 67% 67%
Suicidal 67% 100% 80%

Table 4.8: KNN for Multi-Class Classification

KNeighbors Classifier
y Precision Recall F1-score Accuracy
Anxious 60% 50% 62%
Depressive 25% 50% 40%
Happy 75% 100% 92% 68%
Neutral 75% 50% 60%
Suicidal 100% 50% 67%

Table 4.9: Decision Tree for Multi-Class Classification

DecisionTree Classifier
y Precision Recall F1-score Accuracy
Anxious 75% 50% 44%
Depressive 33% 50% 50%
Happy 40% 33% 33% 45%
Neutral 43% 50% 43%
Suicidal 67% 100% 80%

Table 4.10: Random Forest for Multi-Class Classification

RandomForest Classifier
y Precision Recall F1-score Accuracy
Anxious 100% 50% 67%

28
Depressive 67% 100% 67%
Happy 62% 83% 83% 77%
Neutral 50% 50% 50%
Suicidal 100% 100% 100%

We have shown confusion matrix for all of the classification algorithm that has been used
in this research. If we had dataset with 2000 rows then we believe we can get higher
accuracy, precision, recall and F1-score.

4.6 Summary
From the above confusion matrix’s, we now know that which algorithm works best for our
research. We can now tell that for NLP purpose LinearSVC perform better, then Bernoulli
then Multinomial following table describes this visually.

Table 4.11: Accuracy Comparison for NLP

For NLP

Algorithm Name Accuracy percentage

LinearSVC 77%

MultinomialNB 64%

BernoulliNB 73%

LSTM 73%

For the multiclass classification, we can tell Random Forest classifier works best, then
logistic regression, K-Neighborsclasifier, Decision tree.

29
Following table will describe this visually.

Table 1: Accuracy Comparison for Multi-Class Classification

For Multi-Class Classification

Algorithm Name Accuracy percentage

LogisticRegression 68%

K-NearestNeighbors 68%

Decision Tree 45%

Random Forest 77%

Now we know which specific algorithm perform best for our dataset.

30
CHAPTER 5: STANDARDS, IMPACTS, ETHICS AND
CHALLENGES

5.1 Sustainability
The process we are implementing is a sustainable endeavor. It is not very costly and
requires relatively low resources to maintain. Though research in mental health has been
around for quite a while, research in mental health analysis using NLP and classification
algorithm is a relatively new and popular topic. Students are the future of a nation, their
mental health is as important as their physical health, if not more, for the betterment of
society. Our study can be useful in many ways to observe and improve the overall
psychological health & cognitive situation of students. Which will overall improve the
socio-economic growth of our country.

5.2 Impacts on Society


The society we live in has many kinds of individuals living in it. Some are exceptionally
gifted where some are just workaholics. Some excel in life where some have difficulty
coping with the situations they face. There are many troubled individuals in our society, in
our community. Identifying those who need help will enable us to reach out to them and
provide any kind of assistance they require.

Among these people, some are aware of their situations and some of them will try to reach
out or seek help by themselves. But some will hide these facts because of anxiety, over-
thinking. However, there are many people who need help but aren't even aware of the fact
themselves. Identifying these individuals is even a greater challenge.

Therefore, if somehow, we manage to identify these individuals, even better if they are at
their early age, when they are students, then the help they require, the counselling and
mental support they need, can be provided to them. That is where our research topic comes
in. By the help of machine learning, NLP, classification algorithms we can reach out to
these people & provide the support they need.

31
5.3 Ethics
The more technology progresses, the more paramount privacy concerns become. We are
in the information era now. But if we are not careful, our personal information can be used
without our consent, or even used directly against us. In our modern time and age, we
should always be aware & conscious of the data we collect and the information we produce.
Since we are working with student mental health analyzer, we should be careful of who
gets the data, and how that data is used. Only the authority and people with qualification
to handle these types of sensitive information should have access to this information. Our
research policy and conducting way will comply with the national & global rules and
regulations. This work will not be used in any way that could go against the ethics or
morality of any community.

5.4 Challenges
This model can be very useful in its mature stage, but training a machine to recognize and
accurately identify mental health to judge & group people based on that is pretty
challenging. And there are other challenging factors like family issues, bad relationship etc
will always go under the hood. So, for a research like this to become useful and provide
accurate information, but it needs huge number of data to get a better performance.

Along with these data, comes various privacy issues. The more data our model collects,
the more important it becomes to protect that data from malicious groups or individuals
who might use these data in unethical ways or for their personal gains.

5.5 Summary
Every invention in the history of mankind was intended for the betterment & flourishment
of mankind. The technologies we use today are the culmination of thousands of years of
innovation. Great minds throughout our history have made our lives so much easier &
comfortable in so many ways, but we take these for granted. But in every era, there always
were some people with sinister intentions, who dedicated their time & mind to wicked
schemes. Despite that, mankind flourished, slowly but surely, with the help of the
inventions of science.

32
We are next in line in this innovative endeavor. Our model can help identify those who are
in need of mental support and help. By reaching out to those who need support, we can
ensure all of our students grow up healthy both physically and mentally. Making sure of
the mental health of our future generation at their early age ensures the prosperity of a
nation.

33
CHAPTER 6: CONSTRAINTS AND ALTERNATIVES

6.1 Design Constraints


Our model is used to detect and identify emotional states of students, which is based on
NLP and classification algorithm we learnt in machine learning course which requires a
decent amount processing power to work smoothly as intended. And the amount of data
it’ll be crunching requires a vast amount of RAM.

6.2 Component Constraints


The components we used to train the model was,

• Processor: Intel Core i5-7200U (7th gen)


• Memory: 8GB (DDR4, 2133MHz)

If we use google colab then we do not need high computational machine to perform our
work.

6.3 Budget Constraints


The components we used are fairly common. The configuration of our computer is stated
above and is not very expensive. But the prices of different components in the market varies
with time, so the budget is not consistent, and has to be adopted with time.

6.4 Summary
Our model is versatile, easy to implement and use in most situations, no learning curve
required for the human operator. All the learning is done by the [algo] within the system,
and over time it just gets better and more accurate, and in turn more useful to the society.

34
CHAPTER 7: SCHEDULE, TASKS AND MILESTONES

7.1 Timeline
We divided our work into three parts, so that we could get the project done within the three
semesters we were allotted. All of the work conducted in our project has been in accordance
with the execution process guided. which was provided by our supervisor. In the 1st part,
we submitted our proposal and reviewed the project related works. In the next part, we
collected the data set and implemented the core concept of our model and rechecked with
our supervisor. And by the last part, our model was implemented and tested properly &
thoroughly. We followed our supervisors’ instructions throughout the overall workflow
and reported the process.

7.2 Gantt Chart


The Gantt chart will represent the visual description when, which task was done by
researcher team.

1st Semester

Planning and Study


Topic Selection
Review Related Work
Analyzing Existing System
Built Prototype
Evoulution

35
2nd Semester

Model Diagram
Model System Design
Design Submission
Model Analysis
Partial Implementation
Evolution

3rd Semester

Complete Implementation &


testing

Check issue & Resolve


Model Finalization
Result Evaluation
Report Writing

Presentation & Final


Evaluation

36
CHAPTER 8: CONCLUSION

This work shows us how we can enrich our country and society by helping our students.
We have learnt a lot of things while doing this research. Our supervisor and environment
were great. Our model gave a good accuracy and we hope if our dataset increases then we
will get far better result. This student mental health analysis using NLP and classification
algorithm is very basic idea and can be used in very big projects. We have seen that for
NLP based work Bernoulli and LinearSVC gives us the most accuracy. Then for the
multiclass Classification Random Forest gives us the best accuracy than other algorithms.

8.2 Future Work and Limitations


We couldn’t get all the accuracy wanted from our model. In future we will work on some
other algorithms too which may produce greater accuracy. We can build our mobile app
using this model so that students can easily express their feelings to the counselor or
management of any institute. So that management can take necessary steps to overcome
their mental illness.

Elisabeth et el. [30], they made a resilience program that can lessen the depression or
anxiety interfering with daily functioning. Goal-setting, mindfulness, and resilience skills
were all part of their four-session resilience program. Emotion regulation, mindfulness, and
CBT abilities improved for 162 students who finished the program, with CBT influencing
clinical benefits.

Broglia et el. [31], the effectiveness of counselling for improving student’s depression,
anxiety, academic distress, and trauma. They took over 5000 students’ information. They
found that after some sessions the students mental health improve gradually.

37
References

[1]. Elena Sheldon, Melanie Simmonds-Buckley, Claire Bone, Thomas Mascarenhas,


Natalie Chan, Megan Wincott, Hannah Gleeson, Karmen Sow, Daniel Hind, Michael
Barkham, Prevalence and risk factors for mental health problems in university
undergraduate students: A systematic review with meta-analysis,
https://doi.org/10.1016/j.jad.2021.03.054

[2]. Harmeet Kaur Kang, PhD, MSN, BSN, RN, Christopher Rhodes, MSN, RN, Emerald
Rivers, MSN, RN, Clifton P. Thorn, Prevalence of Mental Health Disorders Among
Undergraduate University Students in the United States: A Reviewton, MSN, RN, CPNP,
and Tamar Rodney, PhD, RN, PMHNP-BC, CNE, https://doi.org/10.3928/02793695-
20201104-03

[3]. Islam MS, Sujan MSH, Tasnim R, Sikder MT, Potenza MN, van Os J (2020)
Psychological responses during the COVID-19 outbreak among university students in
Bangladesh. PLoS ONE 15(12): e0245083. https://doi.org/10.1371/journal.pone.0245083

[4]. Prevalence of stress, anxiety and depression due to examination in Bangladeshi youths:
A pilot study Anowara Rayhan Arusha, Raaj Kishore Biswas Child Youth Serv Rev. 2020
Sep; 116: 105254. Published online 2020 Jul 18. doi: 10.1016/j.childyouth.2020.105254

[5]. Mridha MK, Hossain MM, Khan MSA, et al. Prevalence and associated factors of
depression among adolescent boys and girls in Bangladesh: findings from a nationwide
survey. BMJ Open. 2021;11(1):e038954. Published 2021 Jan 17. doi:10.1136/bmjopen-
2020-038954

[6] Faisal, R.A., Jobe, M.C., Ahmed, O. et al. Mental Health Status, Anxiety, and
Depression Levels of Bangladeshi University Students During the COVID-19 Pandemic.
Int J Ment Health Addiction (2021). https://doi.org/10.1007/s11469-020-00458-y

[7] Son C, Hegde S, Smith A, Wang X, Sasangohar F Effects of COVID-19 on College


Students’ Mental Health in the United States: Interview Survey Study J Med Internet Res
2020;22(9):e21279 URL: https://www.jmir.org/2020/9/e21279 DOI: 10.2196/21279

38
[8] Velten, J., Bieda, A., Scholten, S. et al. Lifestyle choices and mental health: a
longitudinal survey with German and Chinese students. BMC Public Health 18, 632
(2018). https://doi.org/10.1186/s12889-018-5526-2

[9]. O'Dea B, King C, Subotic-Kerry M, O'Moore K, Christensen H School Counselors’


Perspectives of a Web-Based Stepped Care Mental Health Service for Schools: Cross-
Sectional Online Survey JMIR Mental Health 2017;4(4):e55 DOI: 10.2196/mental.8369

[10] cite- Tewari, Abha and Chhabria, Amit and Khalsa, Ajay Singh and Chaudhary,
Sanket and Kanal, Harshita, A Survey of Mental Health Chatbots using NLP (April 25,
2021). Available at SSRN: https://ssrn.com/abstract=3833914 or
http://dx.doi.org/10.2139/ssrn.3833914

[11]. Rafaela Calvo, David N. Milne, M. Sazzad Hussain and Helen Christensen, Natural
language processing in mental health applications using non-clinical texts;
https://doi.org/10.1017/S1351324916000383

[12] Chilman N, Song X, Roberts A, et al. Text mining occupations from the mental health
electronic health record: a natural language processing approach using records from the
Clinical Record Interactive Search (CRIS) platform in south London, UK. BMJ Open
2021;11:e042274. doi:10.1136/bmjopen-2020-042274

[13] Y. Mehta, S. Fatehi, A. Kazameini, C. Stachl, E. Cambria and S. Eetemadi, "Bottom-


Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model
Features," 2020 IEEE International Conference on Data Mining (ICDM), 2020, pp. 1184-
1189, doi: 10.1109/ICDM50108.2020.00146.

[14] M. F. Mridha, M. S. Rahman and A. Q. Ohi, "Human Abnormality Detection Based


on Bengali Text," 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 1102-1105,
doi: 10.1109/TENSYMP50017.2020.9230629.

[15] S. Tiwari, A. Verma, P. Garg and D. Bansal, "Social Media Sentiment Analysis On
Twitter Datasets," 2020 6th International Conference on Advanced Computing and

39
Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 925-927, doi:
10.1109/ICACCS48705.2020.9074208.

[16] Arora, M., Kansal, V. Character level embedding with deep convolutional neural
network for text normalization of unstructured data for Twitter sentiment analysis. Soc.
Netw. Anal. Min. 9, 12 (2019). https://doi.org/10.1007/s13278-019-0557-y

[17] Graterol, W.; Diaz-Amado, J.; Cardinale, Y.; Dongo, I.; Lopes-Silva, E.; Santos-
Libarino, C. Emotion Detection for Social Robots Based on NLP Transformers and an
Emotion Ontology. Sensors 2021, 21, 1322. https://doi.org/10.3390/s21041322

[18]. Kim, K., Moon, J., & Oh, U. (2020). Analysis and Recognition of Depressive
Emotion through NLP and Machine Learning, 6(2), 449–454.
https://doi.org/10.17703/JCCT.2020.6.2.449

[19] S. B. Hassan, S. B. Hassan and U. Zakia, "Recognizing Suicidal Intent in Depressed


Population using NLP: A Pilot Study," 2020 11th IEEE Annual Information Technology,
Electronics and Mobile Communication Conference (IEMCON), 2020, pp. 0121-0128,
doi: 10.1109/IEMCON51383.2020.9284832.

[20] M. Y. Manohar and P. Kulkarni, "Improvement sarcasm analysis using NLP and
corpus-based approach," 2017 International Conference on Intelligent Computing and
Control Systems (ICICCS), 2017, pp. 618-622, doi: 10.1109/ICCONS.2017.8250536.

[21] Aggarwal, C. C., & Zhai, C. (2012). A Survey of Text Classification Algorithms.
Mining Text Data, 163–222. doi:10.1007/978-1-4614-3223-4_6

[22] Deepti Sisodia, Dilip Singh Sisodia, Prediction of Diabetes using Classification
Algorithms, https://doi.org/10.1016/j.procs.2018.05.122.

[23] Le Glaz A, Haralambous Y, Kim-Dufor DH, Lenca P, Billot R, Ryan TC, Marsh J,
DeVylder J, Walter M, Berrouiguet S, Lemey C. Machine Learning and Natural Language
Processing in Mental Health: Systematic Review. J Med Internet Res. 2021 May
4;23(5):e15708. doi: 10.2196/15708. PMID: 33944788; PMCID: PMC8132982.

40
[24] Yang N, Hing E: National Electronic Health Records Survey: 2015 Specialty and
Overall Physicians Electronic Health Record Adoption Summary Tables. 2017.Accessed
Aug 18, 2018Google Scholar

[25] Zunic A, Corcoran P, Spasic I Sentiment Analysis in Health and Well-Being:


Systematic Review JMIR Med Inform 2020;8(1):e16023 doi:10.2196/16023

[26] Chen, Rong-ning; Liang, Shun-wei; Peng, Yang; Li, Xue-guo; Chen, Jian-bin; Tang,
Si-yao; Zhao, Jing-bo (2020). Mental health status and change in living rhythms among
college students in China during the COVID-19 pandemic: A large-scale survey. Journal
of Psychosomatic Research, 137(), 110219–. doi:10.1016/j.jpsychores.2020.110219

[27] Wolny, W. (2016). Emotion Analysis of Twitter Data That Use Emoticons and Emoji
Ideograms. In J. Gołuchowski, M. Pańkowska, C. Barry, M. Lang, H. Linger, & C.
Schneider (Eds.), Information Systems Development: Complexity in Information Systems
Development (ISD2016 Proceedings)

[28] S. Tiwari, A. Verma, P. Garg and D. Bansal, "Social Media Sentiment Analysis On
Twitter Datasets," 2020 6th International Conference on Advanced Computingand
Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 925-927, doi:
10.1109/ICACCS48705.2020.9074208.

[29] Ive, J. Viani, N. Kam et el Generation and evaluation of artificial mental health records
for natural language processing https://doi.org/10.1038/s41746-020-0267-x

[30] Elisabeth Akeman, Namik Kirlic, Ashley N. Clausen, Kelly T. Cosgrove, Timothy J.
McDermott, Lisa D. Cromer, Martin P. Paulus, Hung-Wen Yeh, Robin L. Aupperle; A
pragmatic clinical trial examining the impact of a resilience program on college student
mental health- https://doi.org/10.1002/da.22969

[31] Broglia, Emma, Williams, Charlotte, Ryan, Gemma, Percy, Alan; Profiling student
mental health and counselling effectiveness: lessons from four UK services using complete
data and different outcome measures, doi: 10.1080/03069885.2020.1860191

41
APPENDICES

42

You might also like