Identifying Personality Trait Using Social Media
Identifying Personality Trait Using Social Media
Approach
Abstract
The Social media is no more a new concept today. With increase in the penetration
of internet and low cost smart phones access to social media has more become a
trend and necessity to many. Having more number of likes and plethora of
comments and further sharing the posts has become a social status and prestige
issue to the youth of today. The unique aspect about these features is that likes,
comments and shares are instant responses of users and it is publicly available and
can be accessed by all the friends of a person. This data is not considered as private
data. The paper suggests a data mining approach to predict the personality trait of
an individual by using the likes, comments and shares available in the social
media. The suggested framework takes the likes, comments and shares as input and
processes the same to map it to a personality trait. The paper derives the
framework by considering Big five personality traits.
Existing System
Social media is popular and is growing dynamically. The usage of social media in
increasing day by day and as of today we have around 500 million users on
facebook alone as compared to mere 115 million users on the entire social media
found around a decade back . As information on social media is available to public,
it can be extracted from social media for different purposes. There are multiple
methods to understand personality traits but these methods are time consuming and
so there comes a need of a quick way or framework that can be executed easily and
the one that accepts natural habits and instant responses of individual. Social media
is one of the most easily accessible ways to understand natural behavior of an
individual, understand user’s likes and dislikes and so we can link information
extracted from social media to understand personality traits of social media users.
Studies in past have shown that instead of classifying the sentiments into positive,
negative or neutral, they can be categorized into n-point scale as very good, good,
satisfactory, bad, very bad etc. Thus, each sentiment will be in one category while
classifying the text in the comments. Different classifiers are used to classify the
text and comparative study shows that use of multiple classifiers in a hybrid
manner can improve the effectiveness of sentiment analysis .
Proposed System
The purpose of the study is to propose a theoretical framework that can be used to
identify the personality trait of a social media user. The study considers different
actions by the facebook user, e.g., likes on post, sharing of post and comments on
the post. Prediction of the personality trait involves accepting user actions and
applying text analytical methods and algorithms to retrieve the percentage of each
type of personality trait. The personality trait with highest percentage can be
considered as personality type of the concerned user. For analyzing the sentiments
associated with the text used in the comments by individual, rule based classifier
can be used. The classifier consists of the rules where the condition will the
combination of the words/phrases included in the comments while the result will
be the associated sentiment.
Implementation
Data collected from the social media will be the text from comments posted by the
user. The filtering of the text requires some phrase and pattern based techniques or
term based techniques. Here, the phrase based technique is preferred because
phrases carry more semantic information than terms and hence better performance
can be expected . The main aim for filtering data is to remove the redundant or
irrelevant data. As a result, we will get clean data which can be processed more
effectively. First of all, the probable phrases and their synonyms that can occur in
the comments are listed. This list helped in extracting those phrases from the text.
Also, the dictionary including list of words l ike ‘a’, ‘an’, ‘the’, ’you’, ’of’, ’over’
etc. is made to avoid useless text from getting processed.
Data stemming uses the extracted phrases after data filtering. Stemming is the
process for reducing the words to their stem or root form. In this, the set of words
that can be treated as equivalent are identified and these multiple occurrences are
replaced with their root form .
In this module ,the input will be provided for simplifying the sentiments. The
sentiments which are associated with the text used in comment may be openness to
experience, consciousness, extraversion, agreeableness, neuroticism. The input
here is the stem or root form of the words or phrases used in the comments. So, it
is easier to identify the corresponding sentiments. Social media is one of the most
easily accessible ways to understand natural behavior of an individual, understand
user’s likes and dislikes and so we can link information extracted from social
media to understand personality traits of social media users.
Personality trait repository is used to associate the Big five personality traits with
the corresponding attributes. The attributes considered here are openness to
experience, consciousness, extraversion, agreeableness, neuroticism. Each attribute
included in the repository is again linked with the synonymous words. The
information retrieved is the text in comments. The text is composed of phrases,
certain adjectives. These phrases and adjectives will be the input to the repository
where association between phrases or adjectives and synonymous words will take
place.
Algorithm Implementation
Classification Algorithm
The study involves use of Pattern/rule based classifiers for classification algorithm.
The pattern/rule based classifier determines word patterns which are most likely
related to the different classes. Researchers have constructed a set of rules where
each rule is associated with a keyword. A person cannot be strictly categorized to
belong to one of the personality trait. However, a person can have a combination of
the characteristics that belong to the five personality traits as explained in table 1.
The percentage of those characteristics will vary based on the responses of the user
for the post. The personality trait with highest weight-age among the five
personality traits can be treated as his/her personality trait.
Decision trees are found to be powerful and popular tools for classification and
prediction. Decision trees represent rules which can be easily understood by
anyone and at the same time, it can be used in a database system. This algorithm
requires attribute-value description and pre-defined classes. The properties of the
attributes are collected and provided as input to decision tree algorithm. Also, the
pre-defined classes from the classification algorithm are provided to the decision
tree algorithm. The rules defined here are used to derive results in terms of
personality traits. This can further be used to create personality trait report.
Architecture
REQUIREMENT ANALYSIS
The project involved analyzing the design of few applications so as to
make the application more users friendly. To do so, it was really
important to keep the navigations from one screen to the other well
ordered and at the same time reducing the amount of typing the user
needs to do. In order to make the application more accessible, the
browser version had to be chosen so that it is compatible with most of
the Browsers.
REQUIREMENT SPECIFICATION
Functional Requirements
Graphical User interface with the User.
Software Requirements
For developing the application the following are the Software
Requirements:
1. Python
2. Django
3. MySql
4. MySqlclient
5. WampServer 2.4
Operating Systems supported
1. Windows 7
2. Windows XP
3. Windows 8
In this work focus on appropriate usage of features like comments, likes and shares
promoted by the users of social media. It discusses various methods used to
identify personality traits of social media users and suggests the use of text
analytics for the same. Also, few areas of implications where identification of
personality type of an individual will be beneficial are mentioned. A framework is
proposed to create a word vector from facebook comments and can be used as a
tool to identify the personality trait of the facebook user.