Personality Prediction From Social Media
Personality Prediction From Social Media
Abstract—Today’s world is witnessing great increase in connected and similarity among them. Sentiment analysis
the use of Social Media. People use them as a platform to is also done on social media data to understand the
share their feelings, emotions and experiences along with lot emotions positive and negative both on some topic by
of personal information. All such information could be used users. Researchers have also found and predicted the
in advantageous ways to help increase the Business and
understand the user need. Personality prediction has gained
health problems like mental or Stress level based on
lot of focus nowadays. It studies behavior of users and their posts.
reflects the thinking, feelings etc. Traditional ways to take
survey that was time consuming, for large number of users II. PERSONALITY COMPUTING BASICS
there is need of automatic prediction. Users are dynamic The two important basics in Personality Computing
and can have their account on multiple platforms as they that is used to study personality from the social media
can have multi-context information. This survey gives an data like Text, Multimedia are Theories that are used to
overview of different strategies used to predict the explain, predict and understand the personality and the
personality and behavior using the content available on other is the Technique used for Obtaining the results.
social sites. Ability to predict user’s personality traits can
help to build many customized services or products. Finally,
the last section gives the Future trends and directions.
A. Personality Theories
Personality Theories fall into main 4 categories [1] as
Keywords—Personality prediction; Social Media Psychoanalytic Theory (also referred as psychodynamic),
Trait theory, Humanistic theory and Social cognition
I. INTRODUCTION theory. In the research purpose the trait theory is widely
Personality is defined as the set of different used [2].
Characteristics such as behavior or emotions as a result • Psychodynamic Theory: - According to Freud, the per-
of environmental or biological factors. It reflects the sodality is made up of three components as id, ego and
person’s differences in person’s thinking, behavior and superego. id refers to the impulse energy that is respon-
feelings. Personality traits are continuous in nature as sible for the human needs like nourishment,
they reflect high and low of specific traits in a person on appreciation and urges like hate, urges etc. Superego or
continuous trait rather than showcasing distinct conscience, symbolize morality and Social norms,
personality. The term “personality” originally came from represent what a person wants to be. Ego the third
the Latin word ’persona’ that means “mask”. There are component works on the principle of reality that
three criteria that are used to characterize personality mediates between the demands of the first component
traits: consistency along the situations, stability on basis id and the second component superego and then
of time and Individual differences that means Different chooses the most realistic solution for the long term.
Individuals have different behaviors. The field of study • Trait Theory: - Trait Theory suggests that human
that studies the human personality and its variations person- ality is composed of characteristics, or traits,
among individual and group of people is called that cause a person to act in a particular way. All these
Personality Psychology. traits represent the blueprint for how person behaves
With advent of technology the use of Social examples introver- sion, sociability, aggressiveness,
Networking sites has increased. People use as a platform loyalty and ambition etc. There are various theories like
to express and share their feelings, expectations, Big Five, MBTI, Cattell’s 16PF Trait Theory etc.
experiences etc. Along with this user often share their • Humanistic Theory: - Maslow believed that personality
personal information such as profession, likes and is based on personal choice not on nature or nurture. He
dislikes etc. This information can be extracted [4]. This suggested that people possess and are motivated to help
extracted data can give businesses opportunity to connect them pursue their needs or desire that was represented
with their customers, understand their needs and thus in and the final level: self-actualization that is
improve the quality of service or product accordingly. It developing and growing to reach true potential
is used to find the patterns in connectivity, how they are
• Social Cognition Theory: - The social cognition theory anxious etc are some related traits.
view personality in form of social interactions. Person’s With advent in the technology there had been wide
behavior is affected by the environment in which he increase in use of Social media like Facebook, Twitter,
stays. and Instagram etc. The information shared by users can
The Trait theory is most widely used in studying the be used to understand their personality and helps to
personality in field of Psychology. Unlike other theories understand needs and thus suggest services and facilities
this is based on finding the differences between the or predict their behavior for some situation.
personalities of individuals. The combination of various B. Techniques Used
traits forms a personality that is always unique for every There have been many approaches used for personality
individual. prediction as shown below.
1) Big Five: Today’s Researchers believe that there are
5 personality traits. Big Five suggests that the traits can
be categorized on 5 different classes. Exact labels for
these 5 traits are still difficult to agree for some of them.
Popular acronym is OCEAN for traits.
• Openness: It reflects the intellectual level of a
person. How curious, creative novel a person is. It
also reflects how imaginative or independent a
person is. Openness is related to people’s eagerness
to try to new things, ability to be vulnerable, and
capability to think outside the box. Common traits
related to openness are: Imagination, dif- ferent
interests, Originality, Daring, Cleverness, Intellect,
Creativity, Curiosity etc.
• Conscientiousness: It refers to the aptness of being Fig. 1. Approaches Used
steady, self-disciplined, responsible, focusing on
achieving goals, and prioritizes plans instead of • Questionnaire: The earliest form of approach used
spontaneous behavior. It contrasts careless for personality prediction was in form of questions.
behavior. It denotes how careful, cautious, honest a Users were asked some questions that had multiple
person is. It is way to control impulses and act in a choices, from which user had to select one. These
way that is acceptable socially by everyone around. Questions were different for different personality
This people are great at planning and organizing traits. Based on the user selection of option, it was
effectively. This include factors as planning, rated on some scale. Thus, help to predict the final
responsible, hard-work, determination, ambitious, score for each trait by adding the total scores related
control etc. They are good in leadership qualities. to that question.
• Extroversion: People with high Extroversion quality • Semantic Similarity: In this for the traits there are
have high confidence, Positive energy, and positive pre- defined vocabulary or dictionary words. The
emotions, sociable and urge to interact more with user’s words present in the posts are checked for the
other people. They are talkative in nature. It semantic similar- ity, i.e. similar meanings have
contradicts reserved behavior. Factors related to this same score. The distance is found out and thus the
trait are energy, talkativeness, fun loving, friendly, trait was predicted.
helping etc. These people feel good about themselves • Machine Learning: Classical approaches cannot
as well as about the world around them. People with handle vast amount of data. This is one of the
low extroversion are reserved, quiet. advantages of Machine learning algorithms. ML can
• Agreeableness: This is tendency to be cooperative also find the patterns from the data that might not be
with others instead of being suspicious. They are the visible by the humans.
friendly and liked by their colleagues as well as • Deep Learning: Deep learning [3] can be used to
people surrounding them. They don’t like to fight or predict the personality traits with more accuracy. It
argue rather than they are peace makers. Humble, processes the same way as human brains do. The
politeness, helpful, patient, kind, sensitive etc are the feature extraction process is and there is no
traits that come under umbrella of agreeableness. overload [5].
• Neuroticism: It is contradict confident or secure
III. LITERATURE REVIEW
nature. people with high neuroticism sensitive or
Maite et. al. [7] focused on Personality prediction from
nervous. This trait characterized by sadness,
the Author Profiling task. They used PAN-AP-2015
moodiness, and emotional instability. They
corpus that was collected for social media users from
experience negative emotions and feel- ings easily,
twitter. Four languages were included but this paper
like anger, anxiety, depression, negativity etc. It
focused on English language only. Self- online test was
refers to the tendency to experience negative
taken, and score was given between -0.5 to 0.5. Big Five
emotional states and see oneself and the world
model was used for traits. Then Glove representation in
around negatively. Factors like temperamental,
vector form was used for word embedding. For short
input data, the padding of many zero was done to as CNN Paper also proposes the concept of Latent sentence
require fixed amount of input. Different filters were used Groups (LSG) that means several sentences that are
for Convolution layers. All the outputs were merged closely related to each other. CNN was used for studying
together, and the pooling layer was applied. ReLU is used such latent features. Max pooling layer is used after
as activation function. Fully connected neural network LSTM to get sentence vectors. Softmax classifier was
gives output as 5 neurons one for each stage. Deeper used as the classifier. Various contrast models were used
CNN can be implemented. like TF-IDF bayes, 2 and 3 dimensions CNN, one LSTM
The authors in this paper [8] aim to predict the to compare the results with proposed model, which
personality of twitter users for Arabic users in Egypt. proved to perform better.
They collected the data using AraPersonality. This data The authors of this paper [12] presented a system that
set was collected from Arabic dialect twitter user. could analyze the personality traits for Facebook users by
Questionnaire consisting of several MCQ’s having 5 using their status posts. Big Five personality model was
choices were translated to Arabic language and then used. They used MyPersonality data-set that had 250
filled by the users. And scores were assigned to each users and about 10,000 posts updates from these users.
choice chosen by the user on the basis whether the These posts after extraction were pre-processed by
question is Proportional or inversely proportional to the removing links, symbols etc. All the words were
Big Five Personality Traits abbreviated as OCEAN. converted in their lowercase. A spelling correction
Apart from questionnaire their feeds were also collected. algorithm was used for real time data to correct all the
These Collected users feed then were pre-processed and incorrect spellings in the post. Posts also consisted of
cleaned by removing noisy data like user names, emails symbols like Hashtags (#) and emotions, these were
etc. and some non-Arabic words were converted to removed by keeping the words as it is. TF-IDF was
Arabic. Normalization was done to keep all the words in calculated to extract keywords from documents, thus
one form. The data is then divided into Train and test feature vector was formed. This vector was too large so to
data. TF-IDF was calculated for every user. Three reduce the size and to get only relevant features, Principal
Supervised Machine learning as algorithms namely De- Component Analysis was used.
cision trees, Support vector Machine and Multinomial Machine learning algorithms KNN and SVM were used.
Naïve Bayes was used. KNN was best for Classification of traits.
M. Hassanein et. al. [9] presented an approach to J. Jia [13] focused on mental health problems like
predict the personality on basis of semantics. They used Stress and depression. For Stress detection, features were
big five model on MyPersonality Data-set. Vector Space extracted at different granularity to describe each user as
model is used to represent the user text in the vector from Tweet-level and user level. Author also created a
that hold counts of every word in the text. Similarity benchmark data-set for the multi modal detection. For
measure is used to measure semantics using WordNet tweet level linguistic, visual, social features and for User
Database. level user’s posting behavior and Social features like
The Authors of the paper [10] proposed the model for Influence were extracted. 1-dimension CNN was applied
text analysis and predict the personality of brands on with Cross Auto-Encoder units. Also analyzed the user
Social Media Platform. Big Five model was used to content and posting style. Also, according to the paper
predict the brand personality. This information could help there exists a correlation between the mental health of
brand to plan its Marketing Strategies as well as Improve user and some social concepts like structure, influence,
relations with the Customers. MyPersonality data-set was engagement etc.
used as well as the one was created for Brands pages and According to Di Xue et. al. [14], language is common
features were extracted from both these data-sets. Feature and better way to express their thoughts and feelings for
selection was done by done approaches namely Pearson others to understand, thus text can reflect the personality
Correlation and other was Gradient Boosting on 3 traits. They proposed a 2-level hierarchical deep learning
different Machine learning approaches as Support Vector architecture called AttRCNN, inspired by RCNN for
Regression (SVR), Gradient Boosting and Feed- Forward sentence vectorization to extract the semantic vectors. Big
Neural Network. XGB models perform best and predict Five model was used on MyPersonality project that had
personality. 11 million Facebook users. SVR, Gradient Boosting and
Xiangguo et. al. [11] proposed a new model named as Random forest classifier were used.
2 CLSTM that is a bidirectional Long Short-Term J.Yu et. al. [15] Automatic prediction of personality
Memory network interconnected with CNN to find from user social activities helps to predict his
personality of users. environment as well as has some important applications.
It focused on structure of text as it can be important Author used deep learning approach to predict the
feature. Big Five model with 5 traits was used. Two data- personality based on Big 5 model. The data-set was
sets were used for the experiment. One is long text data- subset of 250 user released by Shared task. Pre-
set of essay data-set of 2467 essays tagged with their processing techniques were applied on it Skip-ngram
author’s traits and another is Short text of YouTube method was used for word embedding. CNN with
vloggers. GloVe algorithm was used for word embedding. average pooling, RNN and FC neural network were used,
LSTM is used which has a self-loop and RNN loop as and results were compared to Machine learning
well, it is bidirectional so as to extract more features. algorithms.
Skowron et. al. [16] used the traces left by users on gave some answers to the Questions that categorize
Digital platforms like social media. User’s with good according to Big Five model. In addition to this, their
reputation in US were selected and asked to fill the information was also collected that would uniquely
Questionnaire and then answers were used to score and define every user with 32 attributes like sex, age,
for same users Instagram and Twitter posts were comments etc and 80 text attributes. LIWC was used to
collected and pre-processed. In this paper, multimodal find positive and negative emotions from the text.
personality traits regression that were user’s information Numerical algorithms such as Linear regression,
from two SNSs and evaluate them on the basis of trained REPTree, Decision Tables were used to predict top and
data acquired from a one Social Network Sites. bottom in the list. REPtree models along with all traits
Tadesse et. al. [17] uses Big 5 model to predict the were accurate to predict users. Further, the automatic
personality of users based on Mypersonality data-set. data mining techniques can be used for prediction
Text features were extracted from posts that reflect thereby reducing the number of attributes used.
language us- age and has expression and topic count The authors [24] used Semi-Supervised method to use
using LIWC and SPLICE dictionaries. Second Social the large amount of unlabeled data in order to improve
interaction features like connectivity, network size etc. the prediction accuracy. Pseudo Multi-view Co-training
Pearson’s correlation is used to measure strength of algorithm was used. To extract the linguistic features the
relationship between variable and to get important techniques such as LIWC and n-grams on Mypersonality
features. XGBoost is used as classifier along with 3 data-set after pre- processing it. Words cloud were built to
baseline algorithms as Logistic regression, Gradient show the how the word is linked with particular
Boosting and SVM. XGBoost gave best results in personality trait using Wordle that displays the word with
predicting the personality traits. highest correlation.
The paper used a new Machine learning algorithm The author of the paper [25] proposed a method to
called Label Distribution learning [18]. The data predict the personality of the Facebook users using their
collected was from Sina Weibo a microblogging site from digital footprints. Big 5 model was used as the Trait
Updates, status etc of the user and a test of 44 questions model. There were 2 data-sets used, one was collected
called as BFI was conducted to obtain their personality from 90,000 and plus Facebook users and other was the
scores. The feature extraction was done in 3 categories personality traits of all these users. The extraction of data
that included static features that had little changes over had more than 600 features to get only the necessary
time like gender, name etc, Dynamic features that features, the LASSO algorithm was used to extract only
changed over time like followers etc and last is Content the main features. Model gave best accuracy for
features like blogs, linguistic, psychological features etc. Openness and Extraversion, the lowest was for
Every Instance is given a label called as real valued. 8 Agreeableness while Conscientiousness and Neuroticism
LDL algorithms were used such as Knn, Bayes, and SVM had moderate accuracy.
etc. these were compared with some Baseline algorithms The system [26] was developed that was a web
like M’5 Rules, Random Forest and Tree, application to predict the use personality based on the
ZeroR,Gaussian, Linear Regression, Support Vector Twitter posts by the user. MyPersonality data-set was
Regression, and MLP. Label distribution with Support used with slight modifi- cations. Indonesian data-set was
vector Machine gave highest accuracy. created by translating above data-set. And User text was
T. Yo et. al. [19] predicted personal attributes like Age, taken from Twitter tweets and made a single document.
Gender, Occupation based on the Users Twitter data Text data was to be represented in Vector form after pre-
collected using API for 120 users. Mecab tool was processing it like tokenization, removal of stop words,
used to collect the words from the post. Skip n gram stemming etc. This Classification is Multi Class as person
method of Word2Vec tool was used to create the word has combination of traits. A Binary classifier was built
embedding. Various ML algorithms along with deep for every trait. Multinomial Nave Bayes model was used
learning algorithms were used like Linear SVC, Random using multimodal distribution with occurrences of word
Forests, KNN and AdaBoost. Full connection Neural or word weight as feature to classify. KNN with Cosine
network was used with varying parameters to get similarity for document Classification was used. SVM
optimized results. Various attributes showed varies result was also used. MNB gave best accuracy.
for the Algorithms. Prediction of gender and occupation
were more accurate using Linear SVC and deep learning. IV. FUTURE TRENDS
Whereas for age groups prediction using Random Forest One of the research areas that requires lot of attention
And AdaBoost were more accurate. is predicting the personality based on Social media data.
The authors Wald et. al. [22] used Data mining A lot of work has been already done but still requires
techniques to predict the personality based on Social some work to be done to increase the accuracy of
Networking site Facebook. Goal of the paper was to find prediction. Thus, requires improvement in various
out the topmost and bottom among the users exhibiting aspects of system like Algorithms, Extraction of
the traits. They used data of previously performed Features, and Data-set etc.
experiment called as Big 5 Experiment done conducted According to authors in [7] shallower and deeper CNN
by Online Privacy Foundation. The aim of the experiment can be implemented that have not been implemented
was to conduct a survey of 537 users of Facebook who previously in Natural Language processing. More over
the predictability of model should be Checked on various Using Convolutional Neural Networks”, Springer Nature
Switzerland AG pp- 313-323, 2018
data-sets.
[8]M. S. Salem, S. S. Ismail, and M. Aref, “Personality Traits for
Increasing the size of data-set and to improve the Egyptian Twitter Users data-set,” Proceedings of the 2019 8th
feature extraction that can improve the accuracy [8]. International Conference on Software and Information Engineering
Using Multivari- ate regression instead of Single for all - ICSIE 19, 2019.
[9]M. Hassanein, W. Hussein, S. Rady, and T. F. Gharib, “Predicting
the traits at once. Also, it is necessary to check if persons
Personality Traits from Social Media using Text Semantics,” 2018
and brand personality are similar, this could be used in 13th International Conference on Computer Engineering and
advantageous ways to provide and recommend services as Systems (ICCES), 2018.
well as help plan marketing strategies [10]. Apart from [10] R. B. Tareaf, P. Berger, P. Hennig, and C. Meinel, “Personality Ex-
ploration System for Online Social Networks: Facebook Brands As
the text other multi- media data can be used to such as
a Use Case,” 2018 IEEE/WIC/ACM International Conference on
photo, videos etc. [11]. Creating dictionary for the patois Web Intelligence (WI), 2018
words used in the social media to predict personality [11] X. Sun, B. Liu, J. Cao, J. Luo, and X. Shen, “Who Am I?
[12]. Personality Detection Based on Deep Learning for Texts,” 2018
IEEE International Conference on Communications (ICC), 2018.
To get proper mental health status, offline personalized
[12] M. Vaidhya, B. Shrestha, B. Sainju, K. Khaniya, and A. Shakya,
measurement must be done for users so that proper care “Personality Traits Analysis from Facebook Data”, 21st
can be provided to them [13]. Regression algorithm International Computer Science and Engineering Conference
designed to predict the personality can be used to increase (ICSEC), 2017.
[13] J. Jia, “Mental Health Computing via Harvesting Social Media
the accuracy by giving input as semantic features [14].
Data,” Proceedings of the Twenty-Seventh International Joint
As the amount of data increases it becomes impossible or Conference on Artificial Intelligence, 2018.
difficult to label the data, thus Unsupervised learning [15] [14] D. Xue, L. Wu, Z. Hong, S. Guo, L. Gao, Z. Wu, X. Zhong, and J.
can be used to predict personality by using the external Sun, “Deep learning-based personality recognition from text posts
of online social networks,” Applied Intelligence, vol. 48, no. 11, pp.
knowledge and thus cluster the text. Larger data set can
4232–4246, May 2018.
help in improving accuracy and help recommend the [15]J. Yu and K. Markov, “Deep learning based personality recognition
users with services, movies, music etc. [17]. Personality from Facebook status updates,” 2017 IEEE 8th International
varies over time, so the data collected should include all Conference on Awareness Science and Technology (iCAST), 2017
[16]M. Skowron, M. Tkal cˇicˇ, B. Ferwerda, and M. Schedl, “Fusing
the previous posts from past.
Social Media Cues,” Proceedings of the 25th International
Conference Com- panion on World Wide Web - WWW 16
V. CONCLUSION Companion, 2016.
Behaviour on Social media sites of users can help in [17]M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Personality
Predictions Based on User Behavior on the Facebook Social Media
predicting the traits of User based on various personality
Platform,” IEEE Access, vol. 6, pp. 61959–61969, 2018.
models. Earlier questionnaire method was used that [18]D. Xue, Z. Hong, S. Guo, L. Gao, L. Wu, J. Zheng, and N. Zhao,
could be a Costly and time-consuming process. The goal “Personality Recognition on Social Media With Label Distribution
of this paper is to give summary of the work done for Learning,” IEEE Access, vol. 5, pp. 13478–13488, 2017.
[19]T. Yo and K. Sasahara, “Inference of personal attributes from tweets
Predicting the personality on text from Social media sites
using machine learning,” 2017 IEEE International Conference on
and to summarize the future trends. Table I Shows the Big Data (Big Data), 2017.
Overview of the Current research techniques Performed [20]Victor Zhou,Machine Learning for Beginners: An
analysis shows the Various techniques and models used. Introduction to Neural Networks, Towards Data Science, 05-
Mar-2019. [On- line]. Available:
Working on the future directions, accuracy can be
https://towardsdatascience.com/machine-learning-for- beginners-
increased of prediction as well as can be used to provide anintroduction-to-neural-networks-d49f22d238f9. [Accessed: 30-
some Customized services and other recommendations. sep-2019].
[21]H. Thilakarathne, Artificial Neural Networks with Net# in
Azure ML Studio, NaadiSpeaks, 08-Nov-2017. [Online].
REFERENCES
Available: https://naadispeaks.wordpress.com/2017/11/08/artificial-
[1] Diener, E. and Lucas, R. (2019). Personality Traits. [online] Noba.
neural- networkswith-net-in-azure-ml-studio/. [Accessed: 30-Sep-
Avail- able at: https://nobaproject.com/modules/personality-traits
2019].
[Accessed 30 Sep. 2019].
[22]R. Wald, T. Khoshgoftaar, and C. Sumner, “Machine prediction of
[2] Thompson, J. (2019). [online] Bizfluent.com. Available at:
personality from Facebook profiles,” 2012 IEEE 13th International
https://bizfluent.com/info-7745856-four-theories-personality.html
Conference on Information Reuse and Integration (IRI), 2012.
[Accessed 30 Sep. 2019].
[23]N. Majumder, S. Poria, A. Gelbukh, and E. Cambria, “Deep
[3] En.m.wikipedia.org. (2019). Deep learning. [online] Available at:
Learning- Based Document Modeling for Personality Detection
https://en.m.wikipedia.org/wiki/Deep learning [Accessed 30 Sep.
from Text,” IEEE Intelligent Systems, vol. 32, no. 2, pp. 74–79,
2019].
2017.
[4] En.wikipedia.org. (2019). Social media mining. [online] Available
[24]H. Zheng and C. Wu, “Predicting Personality Using Facebook
at: https://en.wikipedia.org/wiki/Social-media-mining [Accessed 30
Status Based on Semi-supervised Learning,” Proceedings of the
Sep. 2019].
2019 11th International Conference on Machine Learning and
[5] Bonner, A. (2019). The Complete Beginners Guide to Deep
Computing - ICMLC 19, 2019.
Learning. [online] Medium. Available at:
[25]Laleh and R. Shahram, Analyzing Facebook Activities for
https://towardsdatascience.com/intro-to- deep-learning-
Personality Recognition, 2017 16th IEEE International Conference
c025efd92535 [Accessed 30 Sep. 2019].
on Machine Learning and Applications (ICMLA), 2017
[6]M. K. Hayat, A. Daud, A. A. Alshdadi, A. Banjar, R. A. Abbasi, Y.
[26]B. Y. Pratama and R. Sarno, Personality classification based on
Bao, and H. Dawood, “Towards Deep Learning Prospects: Insights
Twitter text using Naive Bayes, KNN and SVM, 2015 International
for Social Media Analytics,” IEEE Access, vol. 7, pp. 36958–
Conference on Data and Software Engineering (ICoDSE), 2015.
36979, 2019.
[7]Maite Gim´enez, R. Paredes and R. Paolo, ”Personality Recognition
2 Salem[7] Big Five Questionnaire was filled and Scores were assigned. Tf-idf for
every user was calculated. Machine learning algorithms Decision
Ara Personality, Data collected of all users’ trees, Support vector Machine and Multinomial Naïve Bayes was
posts. used.
3 Hassanein[8] Big Five, Introduces a model in which the semantic similarity between the
My Personality user posted text and the words that describes the personality trait is
calculated. Vector Space model is used.
4 Tareaf [9] Big Five Extracted features using Pearson’s correlation and Gradient
Boosting. Used 3 approaches to predict the personality Gradient
MyPersonality and Created one for brand Boosting, Support Vector Regression and Neural Networks
Pages.
5 Sun [10] Big Five Glove for Word embedding. Proposed LSTM method
Essay data and YouTube vloggers interconnected with CNN. CNN was used to study new concept
Latent Sentence Groups i.e. Sentences that are Closely related to
each other. Contrast model like Td-if, 2 CNN, 3 CNN were used to
compare results with LSTM.
6 Vaidhya [11] Big Five, Tf-idf was used to extract keywords. PCA was used to extract
My Personality important features. KNN and SVM machine learning algorithms
were used
7 Jia [12] Collected data from Twitter, weibo Mental Health was found with problems of stress and depression.
LIWC was used.
DNN was used to calculate the personality traits scores.
8 Xue [13] Big Five Proposed a2-level hierarchical deep learning architecture called
MyPersonality AttRCNN, inspired by RCNN for sentence vectorization. SVR,
Gradient Boosting and Random forest classifier were used.
9 Yu [14] Big Five Applied deep learning to learn suitable data representation. Used
MyPersonality Neural Networks like CNN, RNN and Fully Connected neural
network. CNN with average pooling gave best prediction results.
10 Skowron [15] Big Five, Extracted images, linguistic and mta features related to reputation
Crawled user’s data and popularity. Images used features like in emotion detection.
Linguistic features extracted using LIWC etc.
11 Tadesse [16] Big Five model Text features extracted using LIWC and SPLICE along with Social
My Personality Interaction Behavior features like Connectivity, network Size etc.
XGBoost, Logistic Regression, Support Vector Machine and
Gradient Boosting algorithm were used.
12 Xue [17] Big Five Model Uses ML technique Label distribution to give labels or real value
User data collected from Sina Weibo, a Micro vector to every instance. 8 LDL algorithms used along with
blogging site baseline algorithms like SVM, Random forest etc. Result showed
LD with SVM gave best results.
13 Yo [18] Collected Twitter data of 120 users using Predicted Age, Gender and Occupation from Collected data.
Twitter API Used AdaBoost, Linear SVC, Random Forest along with Fully
Connected Neural Network. AdaBoost and random Forest best
predicted Age whereas NN and Linear SVC best predicted Gender
and Occupation
14 Wald [22] Big Five LIWC is used to predict positive and negative emotions from text.
Experiment done by Online privacy Numerical algorithms such as Linear regression, REPTree,
Foundation Decision Tables were used to predict top and bottom in the list.
REPtree models along with all traits were accurate to predict users.
15 Zheng [24] Big Five LIWC and n-grams to extract linguistic features from the text.
Mypersonality Used Semi-Supervised Co-learning algorithm, called as PMC. Also
showed words with highest correlation to particular personality
trait using Wordle to form Word Cloud.