0% found this document useful (0 votes)
76 views19 pages

T-Creo: A Twitter Credibility Analysis Framework: Social

The document presents a framework called T-CREo for analyzing the credibility of posts on Twitter in real-time. The framework consists of three main parts: 1) A front-end web plugin that can be added to any browser. 2) A back-end that implements the credibility analysis model. 3) A module to interface with third-party services. The credibility model analyzes the text, user attributes, and social influence at three levels to assess overall credibility. An evaluation of T-CREo found it performed credibility analysis quickly and could scale to handle large amounts of data from social media platforms.

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views19 pages

T-Creo: A Twitter Credibility Analysis Framework: Social

The document presents a framework called T-CREo for analyzing the credibility of posts on Twitter in real-time. The framework consists of three main parts: 1) A front-end web plugin that can be added to any browser. 2) A back-end that implements the credibility analysis model. 3) A module to interface with third-party services. The credibility model analyzes the text, user attributes, and social influence at three levels to assess overall credibility. An evaluation of T-CREo found it performed credibility analysis quickly and could scale to handle large amounts of data from social media platforms.

Uploaded by

MadhanDhonian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Received February 1, 2021, accepted February 12, 2021, date of publication February 19, 2021, date of current version

March 2, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3060623

T-CREo: A Twitter Credibility Analysis Framework


YUDITH CARDINALE1,2 , IRVIN DONGO1,3 , GERMÁN ROBAYO2 , DAVID CABEZA2 ,
ANA AGUILERA 4 , AND SERGIO MEDINA2
1 Electrical
and Electronics Engineering Department, Universidad Católica San Pablo, Arequipa 04001, Perú
2 Departamento de Computación y Tecnología de la Información, Universidad Simón Bolívar, Caracas 1080, Venezuela
3 Univ. Bordeaux, ESTIA Institute of Technology, 64210 Bidart, France
4 Escuela de Ingeniería Informática, Facultad de Ingeniería, Universidad de Valparaíso, Valparaíso 2340000, Chile

Corresponding author: Ana Aguilera ([email protected])


This research was supported by the FONDO NACIONAL DE DESARROLLO CIENTÍFICO, TECNOLÓGICO Y DE INNOVACIÓN
TECNOLÓGICA – FONDECYT as an executing entity of CONSEJO NACIONAL DE CIENCIA, TECNOLOGÍA E INNOVACIÓN
TECNOLÓGICA - CONCYTEC under grant agreement No. 01-2019-FONDECYT-BM-INC.INV in the project RUTAS: Robots for
Urban Tourism centers, Autonomous and Semantic-based.

ABSTRACT Social media and other platforms on Internet are commonly used to communicate and generate
information. In many cases, this information is not validated, which makes it difficult to use and analyze.
Although there exist studies focused on information validation, most of them are limited to specific scenarios.
Thus, a more general and flexible architecture is needed, that can be adapted to user/developer requirements
and be independent of the social media platform. We propose a framework to automatically and in real-time
perform credibility analysis of posts on social media, based on three levels of credibility: Text, User, and
Social. The general architecture of our framework is composed of a front-end, a light client proposed as a
web plug-in for any browser; a back-end that implements the logic of the credibility model; and a third-party
services module. We develop a first version of the proposed system, called T-CREo (Twitter CREdibility
analysis framework) and evaluate its performance and scalability. In summary, the main contributions of this
work are: the general framework design; a credibility model adaptable to various social networks, integrated
into the framework; and T-CREo as a proof of concept that demonstrates the framework applicability and
allows evaluating its performance for unstructured information sources; results show that T-CREo qualifies
as a highly scalable real-time service. The future work includes the improvement of T-CREo implementation,
to provide a robust architecture for the development of third-party applications, as well as the extension of
the credibility model for considering bots detection, semantic analysis and multimedia analysis.

INDEX TERMS API, credibilty, fake news, information sources, twitter, web scraping.

I. INTRODUCTION as the level of belief that is perceived about (how credible


Nowadays, social media generates an immense amount of it is) a person, object, or process [4], has become essential
information, since they are what people mostly use to share in various disciplines and from different perspectives, such
and read about a wide variety of topics. In this way, infor- as information engineering, business administration, com-
mation is shared in free environments that can be used in munications management, journalism, information retrieval,
several contexts, ranging from everyday life, global and local human-computer interaction [5], [6].
news, to the development of new technologies [1]–[3]. Social However, existing works are limited to be applicable to
media and other platforms on the Internet, which allow users analysis of credibility on specific scenarios (e.g., for a spe-
to communicate, share, and generate information without cific social platform, for a particular application). These
formal references to sources, became popular in the early works differ in the characteristics taken into account to cal-
1990s, producing such a vast amount of information that fits culate credibility (e.g., attributes of the posts or of users who
into the Big Data category. However, in many cases, this posted them, the text of the posts, user social impact) and in
information is not documented or validated, which makes it the extraction techniques used to gather the information to
tough to use and analyze. Hence, the concept of credibility, feed the credibility models (i.e., web scraping1 or API). Thus,

The associate editor coordinating the review of this manuscript and 1 Web scraping is a technique for extracting information, focusing on the
approving it for publication was Yassine Maleh . generation of structured data.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
32498 VOLUME 9, 2021
Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

a more general and flexible architecture is needed, that can be In summary, the main contributions of this work are: (i) the
adapted to user/developer’s requirements and be independent design of a framework to perform credibility analysis on
of the social media platform. social networks, automatically and in real-time; (ii) a cred-
To overcome these limitations, we propose a framework ibility model adaptable to various social networks, integrated
to automatically and in real-time perform credibility analy- into the framework; and (iii) T-CREo as a proof of concept
sis of posts on social media. The framework instantiates a that demonstrates the framework applicability and allows a
credibility model proposed in our previous work [4], which comparative evaluation with existing systems and an evalua-
consists of the credibility analysis of publications on infor- tion of its performance.
mation sources, adaptable to various social networks. The The remainder of this work is organized as follows.
credibility model is based on three aspects: Text Credibility Section II describes and classifies related studies on the credi-
(based on text analysis), User Credibility (based on attributes bility of information from Twitter. Section III summarizes the
about the user’s account, such as creation date, verified credibility model used in the framework, which was previ-
account), and Social Credibility (based on attributes that ously proposed in [4]. Section IV describes the general archi-
reflect social impact, such as followers and following). In this tecture of our proposed framework and T-CREo, an imple-
work, we describe the general architecture of the framework mentation of the described architecture. In Section V, we ana-
and demonstrate its applicability for unstructured information lyze the results of a battery of tests, which are made to
sources, taking as reference Twitter, which is one of the most measure the performance of the proposed architecture and the
used among social media networks. implementation. We conclude in Section VII.
The characteristics of our proposed framework architec-
ture, that make it different from the existing works, are mainly II. RELATED WORK
the following: Existing works consider the extraction and analysis of dif-
• It provides two approaches for accessing the informa- ferent types of information to calculate credibility on social
tion needed for the credibility model: web scraping and media. Thus, several terms of credibility have been pro-
social media API; users/developers can configure the posed [8], [9], [11], [19], [20]. Inspired by these works,
system to base the information gathering only with web we present the following classification of credibility terms in
scraping or combining it with the use of the available social networks:
API; • Text Credibility (Post Credibility): measures the level
• It performs credibility analysis automatically and in real- of relevance and accuracy of the text, independent of
time; the referenced topic [8] or with respect to a certain
• It consists of a front-end, which is proposed as a web topic [11]. It is calculated through text analysis tech-
plug-in to be incorporated on any browser, and a decou- niques, such as Natural Language Processing (NLP).
pled back-end which executes the credibility analysis; • User Credibility: calculates the user account credibility
• It is light-decoupled from external components; as a based on attributes that describe it. It can be calculated
consequence, it is extensible and flexible; thus, it can be based on, for example, the account creation date, if the
adapted to any social media platform and the credibility account is verified, user’s age.
model can be extended by replacing or integrating other • Social Credibility: calculates the credibility of a publi-
measures to calculate different credibility levels. cation, related or not to a topic, based on the available
We develop a first version of the proposed system, called metadata that describe the social impact of the user
T-CREo (Twitter CREdibility analysis framework) as a account and the post itself, with respect to other users. It
proof-of-concept. As a Google Chrome Extension, T-CREo is calculated based on data such as number of followers,
performs the credibility analysis of tweets, in real-time. number of following, retweets.
According to the study presented in [7], Twitter statistics • Topic-Level Credibility: measures the level of accep-
indicate that around 500 millions of tweets are published tance of the topic or event referenced in the text. It
every day. Thus, credibility analysis in such as platform has consists of identifying if the text refers to a specific
become a trending topic in the last decades [8]–[11]. There topic or not, usually through NLP and sentiment analysis
exist many studies proposing Twitter credibility models [4], techniques.
[8], [11], [12] and more complete studies, which also propose Together, all these credibility measures attribute a global
frameworks to perform the credibility analysis automatically credibility level of a publication in a social information
and in real-time [13]–[18]. We qualitatively compare our pro- source; however, usually they are not considered as a whole;
posal with the state-of-the-art and we show the performance some works consider only one aspect or a subset of these
evaluation of T-CREo in various scenarios, with different measures to calculate credibility. Moreover, most existing
variables, such as number of requests and number of concur- works only expose how the credibility model is performed in
rent clients/connections. Results show that the performance a specific context [4], [8], [11], [12], [20], but they do not
of T-CREo qualifies it as a real-time and highly scalable propose or describe a general architecture to support the
service. process, as it is the focus of our work. This is an important

VOLUME 9, 2021 32499


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

aspect to evaluate the applicability of the proposed approach platforms. Afterward, this stored data is analyzed considering
to be applied in real-time applications. two aspects: the temporal relation between the spread of
How the measures to calculate credibility are obtained misinformation and fact checking, and the differences in how
is another issue approached by the existing studies. Some users share them. Being off-line systems, these platforms are
of them base the information extraction on social platform not able to support real-time analysis. An approach to detect
API [19], [21], [22], while others use web scraping tech- Source Of Fake News (SOFNs), by analysing credibility
niques [23]–[26]. Social media API are easy to invoke, of tweets based on graph Machine Learning, is proposed
but they impose limitations and restrictions of use; while in [34]. The credibility analysis is based on user features
web scraping is much more flexible but also requires more (e.g., created at, name, default profile, default profile image,
work and should be adapted each time the HTML struc- favourites count, statuses count, description), social graph
ture changes. Few recent works have focused on comparing of users (followers/following graph), and topic annotations,
both extraction techniques in Twitter, for credibility analy- whose information is gathered with the Twitter API. Binary
sis [27] or to gather unlimited volume of tweets [24]. Machine Learning classifier models are fed with these fea-
More related studies to our work are those focused on tures to predict SOFNs.
proposing a system, an architecture, or a framework to pro- Some other studies more related to our work, have pro-
cess credibility on social networks. Although, our proposed posed architectures to perform real-time credibility analysis
architecture allows configuring the system according to the on Twitter [13]–[18]. In [13], a system for automatically
social network, since our implementation is for Twitter, measuring the credibility of Arabic News content published
we survey in this section such as studies related to Twitter. in Twitter is presented. In this system, in addition to consid-
We compare them in terms of the considered credibility lev- ering the text content, characteristics associated with the user
els, the technique used to extract the information, and their account are evaluated, such as its verified quality and its Twit-
applicability in real-time scenarios. tergrader.com (i.e., text, user, and social credibility levels).
Twittergrader.com measures the power, scope, and authority
A. TOOLS FOR CREDIBILITY ANALYSIS ON TWITTER of a Twitter account based on number of followers, follow-
Some studies have proposed architectures, supported in ers impact, updates, credibility of news, followers/following
the Twitter API, to perform Twitter data analytics in real- relationship, and commitment. The system architecture is
time [21], [22], [28]–[30] or for pre-processing data for sen- supported by the Twitter API and consists of four main com-
timent analysis, as the Service Oriented Architecture (SOA) ponents: text pre-processing, features extraction and compu-
framework proposed in [31], however they are not in the tation, credibility calculation, and credibility assignment and
scope of credibility analysis, as our focus in this work. ranking.
Concerning credibility analysis, some studies propose TweetCred and CredFinder are respectively described
off-line systems able to analyze tweets credibility on user in [14], [15], two practical solutions proposed as Google
demand [32] or from gathered data [33], [34]; hence, they do Chrome extensions, which calculate the tweet credibility in
not offer real-time credibility analysis. In [32], two measures real-time, considering the content in the text, attributes of
are proposed to calculate the topic-level credibility; one of the tweet (publication time, source from which the tweet
them considers the tweet credibility based on the positive and was posted), and social impact. Both use the Twitter API.
negative opinions of the topic and the other one considers the TweetCred [14] architecture is composed by a back-end and
author expertise. The system consists of four modules and a a front-end; the back-end is responsible of calculating cred-
tweet opinion database: the tweet sender/receiver is an inter- ibility score based on Twitter API to fetch the data about
active front-end module, which receives the user’s input and individual tweets and on a Support Vector Machine (SVM)
provides the result to the user; the tweet credibility calculator prediction model; the front-end is a local interface embedded
module utilizes the tweet opinion classifier to identify both in the Google Chrome navigator, from which the tweets IDs
the topic and the opinion of a given tweet; then, it performs are scraped and sent to the back-end and in which credibility
a majority decision on contrary opinions in the same topic scores are shown. The credibility of each tweet is calculated
by using the tweet opinion database, which are stored in in terms of the text (metadata and content) and social infor-
the tweet collector module and labeled with a topic-opinion mation related to the user (e.g., followers, following) and
label by the tweet opinion classifier module. Hoaxy [33] related to the tweet (e.g., retweets, mentions). CredFinder [15]
is a web platform for the tracking of social news sharing. consists of a front-end in the form of an extension to the
The main goal of this work is to let researchers, journal- Google Chrome browser and a web-based back-end. The
ists, and the general public monitor the production of online former collects tweets in real-time from a Twitter search or a
misinformation and its related fact checking. The system user-timeline page and the latter is based on four compo-
presents three components: monitors (URL tracker, scrapy nents: a reputation-based component, a credibility classifier
spider, RSS parser), a database as repository, and an analysis engine, a user experience component, and a feature-ranking
dashboard. The system collects data from two main sources, algorithm. Using the Twitter streaming API, tweets and their
news websites and social media, by using web scraping, web meta-data (the time of posting, the author name, number of
syndication, and where available, API of social networking followers, number of friends, hash tags or mentions, etc.) are

32500 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

obtained. All these data are used as input to the credibility revised works is that they do not consider the four levels of
score calculation algorithm. credibility and the use of the Twitter API. Related to credi-
Another web interface framework, implemented as a web bility levels, only the systems proposed in [13], [18], assess
plug-in system is proposed in [16]. The aim is to analyze, credibility in terms of text, user, and social factors, as we do
in real-time, the credibility of tweets regarding to a specific in T-CREo; credibility related to a specific topic is not consid-
topic. Only the text of each tweet is analyzed to be classified ered. Although, this aspect grants generality to the systems,
as being entailment, neutral, or contradiction with respect the we plan to include the topic level in the future, as an option
topic. The system shows a list of news information related that users can configure. The analysis of the text in T-CREo,
to the topic; thus users can decide the veracity of the tweet is done through filters that detect SPAM, bad words, mis-
in question. The Twitter API is used to collect tweets, web spelling. We do not consider yet including Machine Learning
scraping is used to get the URLs referenced in tweets, and the algorithms or Sentiment Analysis, as in [16], [17], [32]–[34].
Bing news API is used to find articles and retrieve news head- However, such algorithms in the context of specific topics are
lines related to the topic. In [17], a framework for credibility under consideration in our ongoing research work. T-CREo
analysis on Twitter data, with disaster situation awareness architecture is flexible enough to easily incorporate them,
is proposed. This framework is able to calculate in real- thereby completing all four aspects of credibility of a tweet.
time the topic-level credibility (i.e., emergency situations), Concerning the extraction technique, even though all these
by analysing the text, linked URLs, number of retweets, and works use web scraping to get some data (e.g., tweets IDs),
geographic information extracted from both post text and they base their data gathering in the Twitter API. Only the
external URLs, which are kept in a database. Thus, an event system proposed in [33] and our proposal T-CREo, offer
with a higher credibility score indicates that there are more web scraping as an independent technique from the API,
tweets, more linked URLs, and more retweets mentioning which makes these systems more flexible and adaptable to
this event. Data is collected through Twitter API, to get the users/developers needs and capabilities; however, the system
information of the tweets and Google Maps Geocoding API proposed in [33] is applicable only for news verification in
to obtain geolocalization information. The text is analyzed, specific topics.
in the event identification module, to extract words that match Real-time credibility analysis is an important aspect con-
with keywords that describe a specific event (i.e., a disaster sidered in most of these works, including T-CREo. T-CREo
situation topic); then its credibility is calculated by the cred- separates the system in a light front-end, as a Google Chrome
ibility module. extension, and a back-end, which performs the heavy calcu-
A more recent system for real-time credibility analysis is lations, similar to TweetCred [14], CredFinder [15], and the
described in [18]. This framework considers text and user system proposed in [16]. A front-end designed as a web plug-
credibility, aimed to identify fake users and fake news, based in or extension, make it possible to automatically perform the
on neural network models. Text credibility is measured in analysis, without the intervention of final users. Additionally,
terms of retweets, followers, favorites (i.e., social impact), T-CREo front-end is able to recognize, whether the social
and the number of relevant words and the sentiment score media is Twitter or Facebook, thus performing the credibility
(i.e., the text content is analyzed). Users credibility is com- analysis accordingly.
puted by considering their location, URLs, if the account
is verified, the geolocation, the creation date, and the most
recent 20 tweets posted by this user. The Twitter API is III. CREDIBILITY ANALYSIS MODEL FOR SOCIAL MEDIA
used to retrieve these data. The system comprises tweets and In order to calculate the credibility of a post in a social
users monitoring modules, a leaning module, and a credibility network, our proposed framework uses the credibility model
module. proposed in a previous work [27]. The credibility measure
mainly depends on two components: (i) the post’s content:
B. COMPARATIVE ANALYSIS that is a text, which for Twitter is less than 240 characters
All these works described in the previous section, are mainly (at 2020); and (ii) the author: the user that published the
based on Twitter API and use web scraping only to obtain post. Features of text and user are extracted to feed the cred-
the tweets IDs. None of them offer both extraction possibili- ibility model, which consists of three credibility measures:
ties independently. In contrast, T-CREo framework offers to Text Credibility, User Credibility, and Social Credibility. Fig-
users the possibility of choosing which extraction technique ure 1 shows a general view of such as credibility model.
is the most appropriate to their capabilities (e.g., Twitter Text Credibility is entirely related to the post’s text, while
API permissions); thus, it can be configured according users User Credibility and Social Credibility are calculated using
requirements. users’ attributes. Each credibility measure is based on several
Table 1 summarizes the comparison of the reviewed works components that we call filters. Hence, the model becomes
according to the four credibility levels, their applicability for easy to implement, flexible, and extensible. It does not need
real-time analysis, the extraction technique (API, web scrap- advanced data-manipulation, which makes it ideal to use on
ing), the application scope, and the main characteristics of real-time applications. In the following section, we describe
their implementations. The common characteristics of most the credibility model in detail.

VOLUME 9, 2021 32501


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 1. Related Work Comparison

where:
• isSpam(p.text) is a SPAM detector that determines the
probability ∈ [0, 100] of p.text being a spam;
• bad_words(p.text) measures the bad words proportion
∈ [0, 100] against the amount of words in a text;
• misspelling(p.text) measures the misspelling errors pro-
portion ∈ [0, 100];
• wSPAM , wBadWords , and wMisspelledWords represent user-
defined parameters to indicate the weights that the user
gives to each filter, such that: wSPAM + wBadWords +
wMisspelledWords = 1.

B. USER CREDIBILITY
User credibility analyzes only the user as a unit of the
platform, without being influenced by other users, as it is
described in Def. 2.
FIGURE 1. Credibility model.
Definition 2 (User Credibility (UserCred)): Given a set
of metadata of a user who published a post, p.user, the User
Credibility is a function, denoted as UserCred(p.user), that
A. TEXT CREDIBILITY returns a measure ∈ [0, 100], defined as:
Text credibility analyzes syntactically the content of the post UserCred(p.user) = Verif _Weight(p.user) +
(without checking the author attributes), through SPAM, bad Creation_Weight(p.user)
words, and misspelling filters, as shown in Def. 1. where:
Definition 1 (Text Credibility (TextCred)): Given a text of • Verif _Weight(p.user) is a function that returns 50 if the
a post, p.text, the Text Credibility is a function, denoted as user is verified and 0 otherwise;
TextCred(p.text), that returns a measure ∈ [0, 100], defined • CreationWeight(p.user) measures the time since the
as: user’s account was created, with a value between 0 and
TextCred(p.text) = wSPAM × isSpam(p.text) + 50, increasing with the longevity of the account, such
wBadWords × bad_words(p.text) + that:
Account_Age(p.user)
wMisspelledWords × misspelling(p.text) CreationWeight(p.user) = Max_Account_Age(p.user) × 50

32502 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

where: respectively, such that:


– Account_Age(p.user) = CurrentYear − weighttext + weightuser + weightsocial = 1;
YearJoined(p.user) • TextCred(p.text), UserCred(p.user), and
– Max_Account_Age(T ) = CurrentYear− SocialCred(p.user) represent the credibility measure
SocialPlatform_Creation_Year related to the text, the user, and the social impact of p,
– SocialPlatform_Creation_Year is the year in respectively.
which the targeted social platform was created
(e.g., 2006 for Twitter). E. TIME COMPLEXITY ANALYSIS
Concerning computational complexity, User Credibility and
C. SOCIAL CREDIBILITY Social Credibility is performed in constant time, independent
Social credibility is focused on the relations between a user of the post’s size, that is O(1). However, Text Credibility
account and the other accounts in the social media plat- depends on the length of the text. Let n be the amount of
form. It considers the amount of followers and following (see words in the text:
Def. 3). • isSpam is defined as a function that checks if the text
Definition 3 (Social Credibility (SocialCred)): Given a is spam or not, checking every word. Thus the time
set of metadata of a user who published a post, complexity of this function is O(n).
p.user, the Social Credibility is a function, denoted as • bad_words needs to iterate over the words of the text
SocialCred(p.user), that returns a measure ∈ [0, 100], and check if each word is a bad word or not. Then,
defined as: the complexity of bad_words is also O(n);
SocialCred(p.user) = FollowersImpact(p.user) + • The analysis of misspelling is analogue to the
FFProportion(p.user) bad_words: each word is grammatically verified. Thus,
where: the complexity of misspelling is O(n).
• FollowersImpact(p.user) = Since the three filters are sequentially performed, the time
min(p.user.followers,MAX _FOLLOWERS)
MAX _FOLLOWERS × 50, measures the complexity of Text Credibility is 3 × O(n). Moreover, being
impact ∈ [0, 50], on the number of followers; O(1) the time complexity of User Credibility and Social Cred-
• FFProportion(p.usersocial ) = ibility, the time complexity to calculate the Credibility Level
p.user.followers
p.user.followers+p.user.following × 50, measures the propor- of a post is bounded by O(n), with n representing the number
tion ∈ [0, 50], between number of followers and follow- of words in the post. Hence, for short texts, as in Twitter, this
ing of the user. time complexity does not represent a high consumption of
• MAX _FOLLOWERS is a user-defined parameter. computational resources.
The MAX _FOLLOWERS constant is supplied by the user,
for example in [27] it is considered as 2 million. FFproportion IV. A FRAMEWORK FOR CREDIBILITY ANALYSIS IN
is self-explanatory, a simple proportion that increases the SOCIAL MEDIA: OUR PROPOSAL
credibility if the user has more followers than followings. The In this section, we describe the general architecture of our
purpose of this function is to discredit bots, that tend to have framework to perform real-time credibility analysis on social
more following than followers. platforms. Afterward, we present T-CREo, an implementa-
tion of our framework for Twitter, as a proof-of-concept.
D. TOTAL CREDIBILITY LEVEL
The credibility of a post is a weighted sum of the three cred- A. FRAMEWORK ARCHITECTURE
ibility measures described previously. Def. 4 shows how it The general architecture of our proposed framework, shown
is calculated. According to the social network, the respective in Figure 2, follows a Client-Server pattern. The framework
features for User Credibility and Social Credibility have to be is composed by:
identified and obtained. Table 2 shows equivalent attributes • The front-end, a light client proposed as a web plug-in
for Twitter, Facebook, Reddit, Instagram, and LinkedIn. for any browser, which allows analysing Global Credi-
Definition 4 (Global Credibility Level (GCred)): Given a bility in real-time of posts from a social network plat-
post, p, the Global Credibility Level is a function, denoted as form; it also provides the option of analysing a plain
GCred(p), that returns a measure ∈ [0, 100], of its level of text provided by the user, independently of the social
credibility, defined as: platforms, in whose case only the Text Credibility is
GCred(p) = weighttext × TextCred(p.text) + obtained;
weightuser × UserCred(p.user) + • The back-end is the ‘‘source of truth’’ of the framework;
weightsocial × SocialCred(p.user) it implements the logic of the credibility model and pro-
where: vides mechanisms to calculate each credibility measure;
• weighttext , weighuser , and weightsocial are user-defined • Third-party services that groups the services that the
parameters to indicate the weights that the user gives to back-end consumes directly and are not of our intellec-
Text Credibility, User Credibility, and Social Credibility, tual property; they are our data-sources to calculate the

VOLUME 9, 2021 32503


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 2. Available Attributes for Social Networks

whether the social platform is Facebook or Twitter and per-


mits to configure the data extraction mechanism to gather the
information to be sent to the back-end (i.e., web scraping
of all information or scraping of the post IDs combined
with the social media API). When the web scraping option
is selected, the front-end sends the request of performing
Global Credibility analysis, along with all parameters needed
for the credibility model (e.g., number of followers, date of
FIGURE 2. General architecture of the proposed framework. creation of the account). In this case, such parameters are
obtained with web scraping at the front-end. In contrast, if the
credibility of external entities in the credibility model; API option is selected, the Global Credibility request is
on this module, we have applications like Twitter and accompanied only with the post IDs. In addition, it provides
Facebook, which provides all the information related to an interactive option for users to demand Text Credibility
posts and users of the platform. analysis of provided plain texts, independently of the social
The front-end, a web extension (client), sends requests for platforms.
Global Credibility calculation along with some parameters Implementation of the front-end as web, mobile, or desktop
to the back-end. In the case of users providing a plain text, applications are not considered, because they do not allow
it requests Text Credibility analysis. The back-end (server) performing web scraping, which is a requirement in our
receives the request and returns a response, based on its proposal. However, this type of client can interact with the
contents, to the front-end. The back-end is also a client of the back-end, through its API (REST or SOAP), by manually
third-party services web module, which is not under our con- providing the required parameters (e.g., post IDs, user IDs)
trol. Arrows in Figure 2 define the dependency relationship to gather information and get successful responses.
among the components from a high-level view.
Relationship between the front-end and back-end is 2) BACK-END
mandatory, while the connection to a third-part service is The reason why the back-end is separated from the front-end
optional and is only needed when the respective API is used. is that, this way, any external agent from the architecture can
The optional dependency implies that the application can access it. It also makes the system more extensible, since it
still be usable without access to social network API, but our does not have to replicate the logic of the model, only request
proposal needs the back-end to be able to connect to that. Note it. If at any moment there is the need to add another module,
that third-party services module is not exclusively limited to it only has to know how to access and use the back-end
Twitter or Facebook. resources, without knowing what happens behind, making
The credibility model can be adapted to any other social everything in a more declarative way. It offers endpoints to
network. Thus, it is possible to add even more information- access the mechanisms to calculate all credibility measures,
sources (usually social media) accessible through the back- based on REST API or SOAP API (although we recommend
end. In the following, we describe deeply the front-end and the prior because of its flexibility, wide use, and support).
the back-end of the framework. The architecture of the back-end, as shown in Figure 3, is
based on a layered pattern:
1) FRONT-END • The first layer corresponds to the Controllers, that con-
In our proposed architecture, the front-end is a web plug- tains the REST or SOAP API endpoints that the back-
in or a browser extension to avoid user-intervention in the end exposes. Each endpoint is responsible for making
process of obtaining the Global Credibility of posts in a data serialization and de-serialization, validation, and
social network platform. In this way, it is able to do web make the call to the domain layer. There should be
scraping and to make HTTP requests. The front-end is logic- at least one endpoint to calculate Text Credibility for
less – i.e., it does not perform any calculation related to plain texts provided by users and one to calculate Global
the credibility model, which is delegated to the back-end. It Credibility of posts of a social network platform. Global
allows users to configure and set all user-defined parameters Credibility requests can be performed through endpoints
to adjust the results to their needs. It is able of recognizing with any of the following purposes:

32504 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

FIGURE 3. General architecture of the back-end.

– Calculate Global Credibility by gathering all cred- makes the framework works without having access to
ibility model parameters from the social platform social platform API.
API. The rest of the modules of the Data Providers layer,
– Calculate Global Credibility without calling the namely Dictionaries, Bad words dictionaries, and Spam
social platform API – i.e., all needed arguments are detector, are used for the misspelling, bad_words, and
received from the front-end along with the requests; IsSpam filters of the Text Credibility, respectively.
the front-end collects all parameters based on web The following section describes an implementation of our
scraping. proposed framework for Twitter.
The framework offers this choice to allow the develop-
ment of implementations even when access to the social B. T-CREo: AN IMPLEMENTATION OF THE CREDIBILITY
platform API is not possible. Notice that the proposal ANALYSIS FRAMEWORK
does not require to implement endpoints to calculate In this section, we describe T-CREo (Twitter CREdibility
Social Credibility and User Credibility separately, since Framework),2 an implementation of the proposed framework
the front-end requests Global Credibility, which in turn architecture to perform analysis of credibility on Twitter.
internally invokes the corresponding Calculators for all We describe each component as follows.
credibility measures, at the domain layer.
• Calculators at the domain layer implement the credi- 1) FRONT-END
bility model. There is a Calculator for each credibility The front-end of T-CREo is an extension for Google Chrome.
aspect, that invokes the Data Providers layer, if it is Accordingly to our proposal, the front-end allows users to:
necessary. • Analyze Text Credibility of a provided text.
• Each module of the Data Providers layer corresponds • Configure the parameters required for the credibility
to a part of the credibility calculus. It is not mandatory model.
that the back-end implements each module, since the • Select the data extraction technique – i.e., users decide
functionality can be provided by third-party libraries. if they want to base the data gathering on Twitter
For example, in the case of web scraping option, for Text API or web scraping.
Credibility, IsSpam, bad_words, and misspelling filters The web extension detects the website
are implemented in this layer; however, for User Cred- (twitter.com or facebook.com) on the current
ibility and Social Credibility, there is nothing needed at browser tab and updates its user interface accordingly. In this
this layer. In contrast, in the case of API option, besides version of T-CREo, the facebook.com option is off. In the
the Text Credibility filters, Social Network Connectors case of twitter.com, if the page is the home timeline,
should be implemented at this layer to gather the needed it shows only the option to verify tweets using the Twitter
parameters from the third-party services. API (see Figure 4a); while if it is on an user’s profile timeline,
The Social Network Connector acts as a bridge between the option to verify tweets by using web scraping appears next
our REST or SOAP API and the social platform API, to the option to use Twitter API (see Figure 4b). The reason
querying the necessary information to calculate User of only showing the Twitter API option at the home timeline
and Social Credibility that is needed for Global Credibil- is because there is no way to scrap the user and account data
ity. Notice the optional dependency between the Calcu-
lators layer and the Social Network Connector module 2 Github repository: https://github.com/t-creo

VOLUME 9, 2021 32505


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 3. Data Extraction Attributes via Web Scraping

FIGURE 4. Front-end on twitter.com.


FIGURE 5. T-CREo front-end as a Google Chrome Extension when opened
in any website that is not Twitter.

from each tweet’s author, required to calculate the tweets’


credibility, only the tweet IDs are accessible via web scraping; line. Table 3 describes how each argument is scraped from
thus, the rest of the attributes must be gathered via the Twitter the HTML using Javascript, when web scraping is selected
API. On any Twitter user’s profile, for full web scraping as the extraction data technique. If the user chooses to verify
data gathering, the front-end assumes that all tweets in that tweets using Twitter API, only the tweet IDs are scraped.
timeline are from the user authorship, including retweets Section IV-B3 describes in detail how the model attributes are
(retweets can be considered as a copy of another tweet). extracted via Twitter API.
In that way, we can collect all user related information of Although the front-end starts to automatically doing the
the tweets from the header of the user’s profile web page and credibility analysis of tweets in the timeline of the current
the specific parameters of each tweet (i.e., text and language) account, it offers the possibility of analysing a provided text,
from each single tweet in the feed. based on the IsSpam, bad_words, and misspelling filters, even
After selecting the extraction technique, either web scrap- if the user is not in the Twitter website.
ing or Twitter API, the front-end analyzes the current time- Figure 5 shows how the extension looks when the user
line, extracting the necessary information, sending them to is on any website that is not Twitter or Facebook. On the
the back-end, and finally presenting the results over the time- text analysis window, the user can enter the text, select the

32506 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

FIGURE 7. Parameters configuration on T-CREo front-end.

FIGURE 6. Response after clicking Verify button.

language of the text (i.e., English, Spanish, or French), and


ask for credibility analysis. The Verify button makes the
request to the back-end to perform Text Credibility over the
text that the user has supplied. After clicking the Verify button,
the response looks like on Figure 6a for a text in English and
on Figure 6b for a text in Spanish.
Through the T-CREo front-end, the user can see and
edit the parameters for the credibility model (see Figure 7).
It verifies that the sum of the weights on each section
equals 1. Users can access this configuration window by
right-clicking the extension icon in the toolbar, then selecting
Options or by navigating to the extension management page
at chrome://extensions, locating the desired exten-
sion, clicking Details, then selecting the Options link. FIGURE 8. Result of pressing any button under an account’s timeline.
Figure 8 shows a capture from the timeline of
@dhall_lang3 account, that on that date (June 26, 2020) it
is not verified, has 1338 followers, and 0 following. The user- credibility calculation. It exposes several endpoints, but the
defined parameters for the credibility model for this case are most important are:
presented in Table 4.
• /plain-text: it receives the necessary parameters to
calculate the Text Credibility, which are:
2) BACK-END
– The weight of each filter, three in total: wSPAM ,
T-CREo back-end is a REST API developed using
wBadWords , and wMisspelledWords .
Express.js4 that implements the proposed architecture for
– The text to analyze, t.text or just a provided text.
3 https://twitter.com/dhall_lang – The language of the text. At this moment, it supports
4 https://expressjs.com/ English, Spanish, and French. It is supplied via the

VOLUME 9, 2021 32507


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 4. Parameters Used to Calculate Credibility on @dhall_lang a Javascript function. Each filter function just receives the
Account’s Timeline
parameters without worrying about how those were gathered.
Thus, it is possible to reuse the same filter on each end-
point without duplicating code: what really changes between
each endpoint is if the parameters were gathered through the
Twitter API or by web scraping. These filters also interact
with the Data Provider layer, specifically to invoke the Text
Credibility filters.
Since our API is written in Node.js v1.16.0,6 we have
access to the whole NPM package ecosystem7 . The dictio-
naries used for the misspelling filter of Text Credibility are
provided by Wooorm/dictionaries,8 a GitHub repository con-
taining dictionaries for several languages. To detect if a word
is a bad word, we use the washyourmouthwithsoap
lang query parameter and the possible values are library in NPM package, that is in essence a database of
en, es, and fr. The text language is necessary to bad words with a method to lookup a word on a spe-
correctly choose orthography and bad words dic- cific language. For SPAM detection, we use a fork of
tionaries. Currently, we do not have access to a the simple-spam-filter package in NPM, due to the
language detection API or library. authors of the package stopped to maintain it on 2019. Our
This endpoint corresponds to the leftmost branch on fork9 uses the original logic, replacing the dictionary libraries
Figure 1. with the ones from Wooorm/dictionaries. To connect with
• /twitter/tweets: it receives the parameters to cal- Twitter API, we use twit,10 which provides a Node.js API
culate the credibility of a tweet. It uses the same param- client.
eters from the /plain-text resource and the follow- More features can be added to the credibility model by
ing ones: enhancing our back-end. One of those enhancements is for
– The weight of each filter, three in total: weightsocial , the Social Credibility filter by gathering historical tweets
weightuser , and weighttext . from an user. This can be done by running a CronJob (a
– The tweet ID to analyze. Unix tool), that gathers the latest tweets from a set of users.
• /twitter/scraped: this endpoint behaves similar This set could be manually defined or automatically updated
to the /twitter/tweets endpoint. The difference is every time a tweet from an unknown author is analyzed by
that, unlike the latter, this endpoint does not request any the /twitter/tweets endpoint. Having these historical
information from the Twitter API but requires passing data, we can tune the model to give a more accurate score.
the attributes explicitly via query parameters. This is Other tweet’s attributes can be also taken into account, such as
useful to test the whole model since the Twitter API the amount of retweets and likes in a tweet, that can influence
has certain usage restrictions5 per endpoint that does not its credibility. Another improvement more related to the Text
allow making a lot of requests in a 15 minute window, Credibility is to perform sentiment or context analysis on the
and for consumers there may be the case that there tweets and study how that affects credibility.
is no way to obtain the tweet ID but all of the other
information to calculate the Global Credibility. 3) THIRD-PARTY SERVICES
This endpoint receives the weight of each filter, and the At this moment, T-CREo only uses Twitter as third-party web
following attributes: service. The Twitter API access is done on its website.11 From
– If the tweet’s author is verified; this API, T-CREo uses the following two resources:
– Account creation year; • GET users/show/:user_id, that receives a
– Amount of followers of the author; user_id and returns the related information to that user.
– Amount of accounts this author follows (follow- From this resource we use the following fields:
ings). – verified, that is a boolean indicating if the user
Both /twitter/tweets and /twitter/scraped is verified or not;
endpoints use the same method to calculate the Global Credi- – created_at, that is the date when the user
bility of tweets. They both correspond to the whole credibility account was created;
model shown in Figure 1. These endpoints correspond to 6 https://nodejs.org/en/
the Controllers layer from Figure 3. The logic behind each 7 https://www.npmjs.com/
endpoint is on the Calculators layer, where every filter from 8 https://github.com/wooorm/dictionaries
the credibility model proposed on [35] is implemented as 9 https://github.com/t-creo/back-end/blob/54454ff3c927dfb932083c50d
97732e1a3676519/src/calculator/spam-filter.ts
5 https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and- 10 https://github.com/ttezel/twit
engage/api-reference/get-statuses-show-id 11 https://developer.twitter.com/content/developer-twitter/en

32508 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

– followers_count, that is the amount of follow- • Price: $5/month


ers of the user; • RAM/CPU: 1GB / 1 CPU
– friends_count, that is the amount of accounts • Disk: 25 GB SSD disk
that the user follows. • Network: 1 TB outbound data transfer (inbound band-
• GET statuses/show/:id, that receives a tweet ID width to droplets is always free).
and returns all its information, including the one of the Under normal conditions, the web extension (i.e., T-
author. From this resource we use the following fields: CREo’s front-end) is responsible for scraping the tweet data
– full_text, that contains the tweet’s text; and making requests to the back-end to obtain the credi-
– lang, which is the tweet’s language. This is an bility score. This sequence of operations include local in-
important parameter to perform a correct calculus memory read operations (at the front-end), Internet access
of Text Credibility; to send/receive the front-end’s request/response to/from the
– All of the aforementioned fields from the GET back-end, and the back-end operations to calculate the credi-
users/show/:user_id section. bility score. Since local read operations are performed at the
Each endpoint has an usage limit, but it is not a limitation front-end in despicable times (< 0.1ms) and to facilitate the
at this moment, since the application is currently on a testing- tests, we developed a shell script that simulates the front-
phase. We are planning to create a tweet database to keep end tasks, called script-client. Thus, the script-client can be
more recent tweets and perform more experiments. executed at the same droplet where the back-end is executed
In the next section, we present the experiments to evaluate to avoid Internet access impact in the evaluation of the back-
the performance of T-CREo. end performance. Non-local tests involve Internet latency
which prevents analyzing the behavior of the back-end in
V. EXPERIMENTAL EVALUATION a controlled environment. Nonetheless, we also execute the
The experimental evaluation aims to present a quantitative script-client in a different machine to measure the Internet
analysis of T-CREo’s performance over variables, such as impact.
the number of requests and the number of concurrent clients. We use Apache Benchmark,15 a tool for evaluating the
To be specific, we perform tests that involve simulated users performance and behaviour of Apache HTTP servers, to exe-
making requests to verify whether T-CREo framework can cute the tests. In particular, it shows how many requests
support the anticipated load and to measure its performance per second an Apache installation can handle. In the script-
and scalability. The focus is to show how efficiently T-CREo client, we embed the data of the tweets that we randomly
behaves under the pressure of a certain number of requests select. Thus, Apache Benchmark makes the requests to the
and concurrency level. The script to perform the evaluation is back-end and captures the performance measures, with which
available here.12 we perform the analysis, supported on the range of multiple
provided options. For our evaluation, we used concurrency
A. ENVIRONMENT AND TESTS SETUP (-c) and requests (-n):
For the deployment of T-CREo framework, we used Digi- • Concurrency (-c): represents the number of multiple

talOcean cloud services.13 Servers on DigitalOcean are called requests to perform at a time. By default, Apache Bench-
droplets. We hosted T-CREo back-end in four DigitalOcean mark executes one request at a time.
droplets located in San Francisco, CA (SF), and New York, • Requests (-n): specifies the number of requests to per-

NY (NY), as Datacenter regions. We have implemented an form for the benchmarking session. The default is to
automated deployment of the framework within docker con- just perform a single request which usually leads to non-
tainers14 (deploy.sh and TravisCI) by picking the Docker representative benchmarking results.
server version 19.03.12, with Ubuntu 20.04. The droplets are In our experiments, we test with 5, 10, 20, and 30 lev-
referred to as High (i.e., SF High and NY High) for dedicated els of concurrency (or number of simultaneous connec-
CPUs, while those denoted as Low (i.e., SF Low and NY tions/clients), each one with 50, 100, 200, 500, 1000, and
Low) have shared CPUs. High droplets have the following 2000 requests (i.e., 24 different experiments). The number of
characteristics: requests is evenly divided between the simultaneous clients.
• Price: $320/month
Since the back-end is implemented as a JavaScript runtime,
• RAM/CPU: 32GB / 16 CPUs
which is single-threaded, the concurrency level represents the
• Disk: 200 GB SSD disk
simultaneous connections that the back-end has to manage;
• Network: 7 TB outbound data transfer (inbound band-
it does not represent the number of attended simultaneous
width to droplets is always free). requests. These 24 experiments were performed in six dif-
ferent scenarios:
On the other hand, the Low droplets features are:
- Scenario a: Two local scenarios in SF droplets: the script-
12 https://github.com/t-creo/back-end/blob/develop/scripts/script- client and the back-end are executed at the same Low and
evaluation.sh High droplet, denoted as Local SF Low and Local SF High.
13 https://www.digitalocean.com/
14 https://github.com/t-creo/back-end/blob/develop/deploy.sh 15 https://httpd.apache.org/docs/2.4/programs/ab.html

VOLUME 9, 2021 32509


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

- Scenario b: Two remote scenarios with machines in SF: the TABLE 5. Input Parameters for the /Calculate/Tweets/Scraped Endpoint
script-client is executed in a machine different from the back-
end, resulting as SF Low-SF Low and SF High-SF High.
- Scenario c: Two remote scenarios with the script-client
executed in a SF Low machine and the back-end in NY Low
and High machines, having SF Low - NY Low and SF Low
-NY High. In this scenario, we execute the script-client at the
Low machine configuration, since users can access the service
from mobile devices or desktop PCs, which are not necessary
powerful machines.
In total, we present in Section V-C results for 24×6 = 144
experiments.
Although T-CREo framework offers two methods to
extract the information and to calculate the credibility (i.e.,
web scraping and Twitter API), we did not test the end-
point that uses the Twitter API, as it differs from web
scraping only in the technique to fetch the data. Further-
more, calls to the Twitter API are limited and incur in
network latency, which is not part of the scope of our
discussion. In addition to having two methods to obtain a
credibility score, T-CREo has endpoints to calculate the spe-
cific scores of the credibility formula, i.e., Text, User, and
Social scores. In the experimental evaluation, we only test
the /calculate/tweets/scraped endpoint, since it
encompasses all the operations of those endpoints.
The input parameters remain constant throughout the tests, in this scenario, for both cases, evidence that the number of
to minimize the complexity and to obtain a more accurate requests and connections (i.e., Concurrency) do not affect
comparison of the results across the tests. The input param- the behavior of the back-end; it keeps stable, independently
eters for the /calculate/tweets/scraped endpoint of these factors. Then, under this scenario, the back-end is
are exhibited in Table 5. The User Credibility and Social highly scalable: number of requests and level of concurrency
Credibility parameters (e.g., followers, following, verified) do not impact the time per request. Although these local tests
are attributes of a randomly selected user at the time of the do not represent an environment fully adapted to a real-life
study, and all weight parameters (i.e., wSPAM , wBadWords , and use case, because this environment means that the user is
wMisspelledWords ) were also chosen randomly. deploying the server on their local machine, these tests allow
analyzing the performance of the back-end ignoring factors,
B. TEST METRICS such as latency, bandwidth, geographic region of the user.
Apache Benchmark has a multiple range of options to per- Results in scenario b and scenario c, show that the effect
form the request and return various measures in its output. of Internet access is also impacted by the capacity of the
Since the focus of our tests is performance and scalability, server and the proximity of locations between client and
we use the following metrics: server. In scenario b, with High capacity and closeness loca-
tion (i.e., SF High - SF High), the back-end is able to attend
• Time Per Request: This value is the average time spent each request as in the local scenario a, with an average of
per request. 1.43ms per request, and it remains stable independently of
• Number of Requests Per Second: This metric is the result Requests and Concurrency (the difference between the MIN
of dividing the number of requests by the total time taken and MAX values is very low: MAX-MIN = 1.21ms). This is
to resolve all of them. the best result obtained for the three scenarios, representing
an average up to 23% of improvement with respect to the
C. RESULTS local scenario a; this is because in scenario a, the back-end is
Table 6 shows the average time per request for the 144 exper- sharing resources with the script-client. Meanwhile, with Low
iments. In scenario a, where the network access is not con- capacity, even though client and server are close (i.e., SF Low
sidered (i.e., the script-client and the back-end are executed - SF Low), the average time per request increases around 53%
in the same droplet), the capacity of the server (Low or High) and 64%. Thus, under these conditions, scalability is bounded
has not any impact in this metric. Results are quite similar by the capacity of the servers.
in all experiments in this scenario; the average is 1.87ms In scenario c, the fluctuations of the back-end increases,
and 1.57ms for SF Low and SF High, respectively. The reaching a difference up to 73.59ms between the MIN and
difference between MIN and MAX values (≤2ms) observed MAX observed values (for SF Low - NY Low). As in

32510 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

previous scenarios, the number of requests does not affect the (see rows AVG and MAX in Table 6, respectively), we can
behaviour of the back-end, however the level of concurrency conclude that T-CREo performs in real-time.
generates different results: as concurrency increases, the time
per request decreases. Although the level of concurrency does VI. DISCUSSION
not represent simultaneous attended requests, but simultane- T-CREo implementation demonstrates the feasibility of a
ous connections, the improvement is due to the fact that the scalable system for real-time credibility analysis in social net-
greater the number of connections, the more requirements are works. This experience also gives the opportunity of extract-
going through the network, achieving overlapping of compu- ing its current limitations and some lessons learnt.
tation with communication. Thus, the server waits less for the
requirements to arrive; they are already on its machine, when A. IMPROVE THE PERFORMANCE
it finishes one request. Hence, the back-end remains scalable As outlined in Section IV-B2, the back-end is developed in
in these conditions. Express, a Node.js framework implemented as a JavaScript
In general, these results show that the average time per runtime built on Chrome’s V8 JavaScript engine. Javascript
request increases as the remoteness increases and the server is a single-threaded language and it has not a native way
capacity is lower. Nevertheless, the level of concurrency of creating threads to parallelize the work. A possible solu-
allows the back-end remains scalable; in fact, as the level of tion available in Node.js to overcome this limitation is the
concurrency increases the time per request decreases. This usage of asynchronous tasks, which are frequently used to
result can be better appreciated in Figure 9. Note that the parallelize I/O operations, such as reading files and network
time per request in scenarios in which network access has not calls. All endpoints in the current version of T-CREo are
impact (i.e., Local SF Low, Local SF High, and SF High - SF implemented as synchronous tasks, since they do not perform
High) does not highly vary across Request and Concurrency, network requests or file readings. Making a synchronous task
while time per request for scenarios in which Internet access behaves as an asynchronous task is counterproductive and not
has impact (i.e., SF Low - SF Low, SF Low NY Low, and SF recommended. Accordingly, T-CREo’s endpoints are able to
Low - NY High) does not highly vary as number of request handle only one request at a time. Concurrency level in the
increases and, instead decreases as the level of concurrency experiments represents the simultaneous connections that the
increases. back-end has to manage; it does not represent the number of
Table 7 shows the results for the same 144 experi- attended simultaneous requests.
ments, but measuring the number of requests per second. This limitation affects the framework performance,
As expected, best results are obtained in scenarios where the as shown in Section V: the server with the lowest resources
bests time per request were obtained: Local SF Low, with consumes ∼ 40% of the CPU, while on the one that has the
569.55 requests/s; Local SF High, with 664 request/s; and largest resources, it barely consume ∼ 0.5%. It is obvious
SF High - SF High, with 716 request/s. The worst result is for that we should take actions to make a better usage of the
SF Low - NY Low, with 52 request/s, because of the network resources of back-end servers. Since, asynchronous tasks
access; this impact is reduced by the number of connections, are not an appropriate alternative to implement concurrency,
as we explained before (see Figure 10). other possibilities should be tried, such as:
This battery of experiments demonstrates that the back-end • Run several replicas (or instances) of the server and let
is scalable, even though the CPU capacity is not fully used: a scheduler web server (such as Nginx18 ) attends the
Low capacity droplets reach, in average, 40% of use of the requests in a round-robin fashion or in any other avail-
CPU, while the High capacity ones, barely consume 0.5% able scheduling strategy. This way, it can mimic multi-
of the CPU. Better management of resources, such as multi- threads and parallelism by handling several requests at
threading to actually serve several requests simultaneously, the same time.
should improve even more these results. • Use a new experimental feature of Node.js, named
Concerning the real-time capacity of T-CREo, we consider Worker Threads,19 to implement multi-threaded appli-
the definition provided in the Realtime API Hub site16 : real- cations in a single node process. The downside is that,
time refers to a synchronous and bi-directional communica- for our current Node.js version, it is a non-stable feature.
tion channel between endpoints at a speed of less than 100ms. Nonetheless, it is stable for the latest long-term support
Although 100ms is an arbitrary number, experiments con- version of Node.js v14.15.0, so an upgrade can be made
ducted with 81.000.000 people determined that the median before deciding to implement this feature.
reaction time for humans is 273ms, whereas average reaction • One of the advantages of using Javascript for this first
time is 284ms. Anything below this value is considered to version of T-CREo is the fast implementation; however,
deliver a satisfactory user experience.17 Since the highest it does not scale very well on productive environments.
average time per request obtained in our experiments is We can port the code to a different programming lan-
24.67ms and the highest maximum obtained value is 83.88ms guage that supports threads and has better performance.
16 https://realtimeapi.io/hub/realtime-api-design-guide/ 18 https://www.nginx.com/
17 Tested by Robert Miller https://humanbenchmark.com/ 19 https://nodejs.org/dist./v10.16.0/docs/api/worker_threads.html

VOLUME 9, 2021 32511


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 6. Time Per Request at /Calculate/Tweet/Scraped in ms

The current implementation is well modularized and has B. EXPERIMENTAL EVALUATION OF THE TIME
a few quantity of functions (roughly 2000 lines of code COMPLEXITY
and mostly of configuration); thus, porting T-CREo to Regarding the evaluation of the credibility model, it was
another programming language would not take a lot of not worth doing experiments to demonstrate its O(n) time
effort. complexity, with n representing the number of words in the
Another idea to improve the performance is to use a search analysed text and being bounded by the Text Credibility (see
engine or an external database to implement some of the Section III-E). Since the maximum number of characters in
components of the data provider layers from Figure 3. On the Twitter is quite short, it does not represent an appreciable
current implementation, the data providers layer runs in the time. However, in other social media without text size lim-
same process of the T-CREo back-end, therefore the dictio- itation, the Text Credibility impact can be better measured.
nary lookup algorithms are implemented in Javascript, which
is not ideal for highly intensive CPU operations. This also C. EXTEND THE IMPLEMENTATION FOR OTHER SOCIAL
makes the server to take some time (aprox. 1 minute) to start NETWORK PLATFORMS
and be able to receive petitions. By delegating these lookups Although the current version of T-CREo framework only
for misspelling and bad words filters in a separate applica- supports Twitter, the proposed architecture works for any
tion, such as Elasticsearch20 or even a simple RDBMS, like social network that we can either run web scraping or has an
PostgreSQL, the server can be able to receive petitions earlier API. This flexibility allows integrating other social media net-
and the lookups can be made in a most efficient way. works into T-CREo. Integrating other three social networks,
such as LinkedIn, Reddit, and Facebook, is also feasible.
LinkedIn is an employment-oriented social network,
mainly used for professional networking. There are several
20 https://www.elastic.co/es/what-is/elasticsearch kinds of user, but the most important (and the ones we focus)

32512 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

FIGURE 9. Time per request from Table 6.

FIGURE 10. Requests per second from Table 7.

is the regular user type. These users usually are job seek- scrap each post, that contains text and other attributes that
ers and company recruiters. Adding support for LinkedIn in can enhance the model. The analogy of a ‘‘verified user’’
T-CREo can be done via web scraping on user’s activity feed, from Twitter can be ported to a ‘‘premium user’’ on LinkedIn.
similar to how it is implemented for Twitter users’ profile. The only attribute that is not available to scrap is the creation
That section can be visited by going to a user’s profile and year of the account, but it can be changed by the oldest
clicking ‘‘See All’’ on the Activity section. There, we can year in their work experiences. LinkedIn have an API, but

VOLUME 9, 2021 32513


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

TABLE 7. Number of Requests Per Second at /Calculate/Tweet/Scraped

it is very limited. We need to request permission to users tion, as on LinkedIn. These problems make it difficult to
to access their information, unlike Twitter, that we can use make an integration on both available choices with Facebook,
their API token to access any user’s information without their although it is not impossible to implement. In scenarios in
permission. which sources must be trustworthy and reliable, such as jour-
Reddit describes themselves as ‘‘the front page of the nalist and governor accounts, it should be a must for them to
Internet’’. It is a social network where users share posts in grant permission for credibility analysis (in an ideal world!).
sub-forums on the platform, each sub-forum is focused on
a topic and it is maintained by users themselves. Applying D. CREDIBILITY ANALYSIS OF TEXT ON IMAGES
web scraping on Reddit is a hard task since the HTML Another enhancement for T-CREo is to read text from images
contents of the page are generated automatically by a third- and run our credibility model with that text. This can be easily
party software, but their API offers the necessary resources done by running Optical Character Recognition21 (OCR)
for an integration on T-CREo to be possible. The credibility software on the back-end. The integration can be done in a:
model might need to be modified, since some fields are not • Dedicated endpoint, say /image, that receives the same
available on Reddit, or have another meaning. For example, parameters from /plain-text but instead of a text
there is a following and followers notion, but another impor- it can receive an image URL or a sequence of bytes that
tant metric for social credibility is karma, which is a number represents the image.
that increases as the user is more active in communities and • Enhancing our Twitter API integration to get all images
makes good contributions. of a tweet and include the text found in any of the images
On Facebook, we cannot do web scraping because of the in the tweet’s text.
same reasons of Reddit. The CSS selectors used for scraping Although we do not have experience implement-
are randomly generated on each new release and changes ing or using OCR libraries on Node.js, a search of
very frequently. They have an API, but we need to explicitly
21 https://en.wikipedia.org/wiki/Optical_character_recognition
request permission to other users to access their informa-

32514 VOLUME 9, 2021


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

Optical Character Recognition and OCR on npm opment of third-party applications. We also plan to extend
shows that all popular choices are bindings of Tesser- the credibility model by considering bots detection, semantic
act,22 which is an OCR engine open-sourced by HP. analysis of the text, and multimedia data analysis.
node-tesseract-ocr23 seems like a good choice for
ACKNOWLEDGMENT
this, since its bindings are really similar to the bindings of the
original Tesseract implementation. The authors would like to thank Sergio Barrios, Fabi-
ola Martínez, Yuni Quintero, José Acevedo, and Nairelys
Hernández who have contributed to the implementation and
E. REDUNDANCY AS A STRATEGY TO OVERCOME SOCIAL
experiments.
MEDIA API LIMITATIONS
In the studied social media, Twitter is the one that has the REFERENCES
most flexible permissions, but it still has some limitations [1] D. Westerman, P. R. Spence, and B. Van Der Heide, ‘‘Social media as
information source: Recency of updates and credibility of information,’’
on the number of requests that we can consume in a time J. Comput.-Med. Commun., vol. 19, no. 2, pp. 171–183, Jan. 2014.
window. This limitation is not exclusive of Twitter API. [2] Y. Kammerer, E. Kalbfell, and P. Gerjets, ‘‘Is this information source
A solution for this problem is to keep a shallow copy of commercially biased? How contradictions between Web pages stimulate
the consideration of source information,’’ Discourse Process., vol. 53,
the requested tweet in a database that T-CREo’s back-end nos. 5–6, pp. 430–456, Jul. 2016.
can query and synchronize regularly with Twitter real data. [3] J. Slomian, O. Bruyère, J. Y. Reginster, and P. Emonts, ‘‘The Internet
A synchronization strategy can be to run a scheduled work as a source of information used by women after childbirth to meet their
need for information: A Web-based survey,’’ Midwifery, vol. 48, pp. 46–52,
until all data is updated. The endpoint to get tweets’ detail May 2017.
has a certain number of requests per 15 minutes time window. [4] I. Dongo, Y. Cardinale, and A. Aguilera, ‘‘Credibility analysis for available
We can implement a job that updates as much tweets as it information sources on the Web: A review and a contribution,’’ in Proc. 4th
Int. Conf. Syst. Rel. Saf. (ICSRS), Nov. 2019, pp. 116–125.
can and updates the remaining tweets of our internal database [5] S. Y. Rieh and D. R. Danielson, ‘‘Credibility: A multidisciplinary frame-
15 minutes later. work,’’ Annu. Rev. Inf. Sci. Technol., vol. 41, no. 1, pp. 307–364, 2007.
This would not only overcome the problem with request [6] T. J. Johnson and B. K. Kaye, ‘‘Reasons to believe: Influence of credibility
on motivations for using social networks,’’ Comput. Hum. Behav., vol. 50,
usage limits, but can also be used to improve the credibility pp. 544–555, Sep. 2015.
model from [4], by storing more properties and data. [7] Omnicore. (2020). Twitter by the Numbers: Stats, Demographics &
Fun Facts. [Online]. Available: https://www.omnicoreagency.com/twitter-
statistics
VII. CONCLUSION AND FUTURE WORK [8] C. Castillo, M. Mendoza, and B. Poblete, ‘‘Information credibility on
In this work, we propose a general architecture of a frame- Twitter,’’ in Proc. 20th Int. Conf. World Wide Web, 2011, pp. 675–684.
[9] B. Kang, T. Höllerer, and J. O’Donovan, ‘‘Believe it or not? Analyzing
work for credibility analysis in social media based on a information credibility in microblogs,’’ in Proc. IEEE/ACM Int. Conf. Adv.
general credibility model. The framework is capable of calcu- Social Netw. Anal. Mining, Aug. 2015, pp. 611–616.
lating credibility on any social media in real-time, combining [10] S. M. Shariff, X. Zhang, and M. Sanderson, ‘‘On the credibility perception
of news on Twitter: Readers, topics and features,’’ Comput. Hum. Behav.,
web-scraping and social media APIs to gather the parameters vol. 75, pp. 785–796, Oct. 2017.
needed to instantiate the credibility model. A proof of con- [11] M. Alrubaian, M. Al-Qurishi, A. Alamri, M. Al-Rakhami, M. M. Hassan,
cept, for a specific use case of Twitter and to show the feasi- and G. Fortino, ‘‘Credibility in online social networks: A survey,’’ IEEE
Access, vol. 7, pp. 2828–2855, 2019.
bility of the proposed architecture, named T-CREo (Twitter [12] M. Viviani and G. Pasi, ‘‘Credibility in social media: Opinions, news,
CREdibility analysis framework), is developed and tested and health information—A survey,’’ Wiley Interdiscipl. Rev., Data Mining
Knowl. Discovery, vol. 7, no. 5, p. e1209, Sep. 2017.
to evaluate its performance. Results show that our proposed [13] H. S. Al-Khalifa and R. M. Al-Eidan, ‘‘An experimental system for mea-
framework can be implemented as a real-time service and suring the credibility of news content in Twitter,’’ Int. J. Web Inf. Syst.,
the scalability is ensured by increasing the level of concur- vol. 7, no. 2, pp. 130–151, Jun. 2011.
[14] A. Gupta, P. Kumaraguru, C. Castillo, and P. Meier, ‘‘Tweetcred: Real-
rency. This experience allows outlining some suggestions to time credibility assessment of content on Twitter,’’ in Proc. Internat. Conf.
improve overall performance for high-capacity servers. The Social Informat., 2014, pp. 228–243.
modularity and simplicity of T-CREo, and the use of the [15] M. AlRubaian, M. Al-Qurishi, M. Al-Rakhami, M. M. Hassan, and
A. Alamri, ‘‘CredFinder: A real-time tweets credibility assessing system,’’
credibility model, enable the creation of a real-time service; in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM),
however, the connection time (latency) can be a determining Aug. 2016, pp. 1406–1409.
factor, that might be considered in the deployment of the [16] T. Stephanie, ‘‘Spot the lie: Detecting untruthful online opinion on twitter,’’
Ph.D. dissertation, Dept. Comput., Imperial College London, London,
system. U.K., 2017. [Online]. Available: https://www.doc.ic.ac.uk/~oc511/
Our future research is focused on the improvement of T- reportStephanie.pdf
CREo, starting with the suggestions from Section VI, such [17] J. Yang, M. Yu, H. Qin, M. Lu, and C. Yang, ‘‘A twitter data credibility
framework—Hurricane Harvey as a use case,’’ ISPRS Int. J. Geo-Inf.,
as the implementation of several instances or multi-threaded vol. 8, no. 3, p. 111, Feb. 2019.
versions of the back-end to improve the performance, keep an [18] A. Iftene, D. Gîfu, A.-R. Miron, and M.-S. Dudu, ‘‘A real-time system for
credibility on Twitter,’’ in Proc. The 12th Lang. Resour. Eval. Conf., 2020,
external database of posts to overcome API limitations, and pp. 6166–6173.
incorporate credibility analysis in other social platforms, to [19] K. R. Saikaew and C. Noyunsan, ‘‘Features for measuring credibility
provide a robust architecture to the community for the devel- on facebook information,’’ Int. Scholarly Sci. Res. Innov., vol. 9, no. 1,
pp. 174–177, 2015.
[20] A. M. Idrees, F. Kamal, and A. I., ‘‘A proposed model for detecting
22 https://github.com/tesseract-ocr/tesseract
facebook News’ credibility,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 7,
23 https://www.npmjs.com/package/node-tesseract-ocr pp. 311–316, 2019.

VOLUME 9, 2021 32515


Y. Cardinale et al.: T-CREo: Twitter Credibility Analysis Framework

[21] A. Black, C. Mascaro, M. Gallagher, and S. P. Goggins, ‘‘Twitter zombie: IRVIN DONGO received the B.Sc. degree in
Architecture for capturing, socially transforming and analyzing the Twit- computer science from Universidad Católica San
tersphere,’’ in Proc. 17th ACM Int. Conf. Supporting Group Work, 2012, Pablo, Peru, in 2012, and the M.Sc. and Ph.D.
pp. 229–238. degrees from the University of Pau, France,
[22] M. Congosto, P. Basanta-Val, and L. Sanchez-Fernandez, ‘‘T-hoarder: in 2014 and 2017, respectively. He was a Postdoc-
A framework to process Twitter data streams,’’ J. Netw. Comput. Appl., toral Fellow with the École Supeŕieure des Tech-
vol. 83, pp. 28–39, Apr. 2017.
nologies Industrielles Avancées (ESTIA) Institute
[23] A. Hernandez-Suarez, G. Sanchez-Perez, K. Toscano-Medina,
R. Toscano-Medina, V. Martinez-Hernandez, J. Olivares-Mercado, of Technology, France, from 2018 to 2019. He
H. Pérez-Meana, and V. Sanchez, ‘‘Can Twitter API be bypassed? is currently an Associate Researcher in computer
A new methodology for collecting chronological information without science with ESTIA Institute of Technology and
restrictions,’’ in Proc. SoMeT, 2018, pp. 453–462. also with Universidad Católica San Pablo. His research interests include
[24] A. Hernandez-Suarez, G. Sanchez-Perez, K. Toscano-Medina, the normalization and anonymization of Web resources, knowledge-bases
V. Martinez-Hernandez, V. Sanchez, and H. Pérez-Meana, ‘‘A Web modeling (Semantic Web), policies and management of credentials, security
scraping methodology for bypassing Twitter API restrictions,’’ CoRR, model and anonymization technique, and machine/deep learning techniques
vol. abs/1803.09875, pp. 1–8, Mar. 2018. for an analysis and classification of data to discover patters and gesture
[25] D. Freelon, ‘‘Computational research in the post-API age,’’ Political Com- recognition.
mun., vol. 35, no. 4, pp. 665–668, Oct. 2018.
[26] B. Kusumasari and N. P. A. Prabowo, ‘‘Scraping social media data for GERMÁN ROBAYO is currently pursuing the
disaster communication: How the pattern of Twitter users affects disasters
degree and the bachelor’s degree in computer engi-
in Asia and the pacific,’’ Natural Hazards, vol. 103, no. 3, pp. 3415–3435,
neering from Universidad Simón Bolívar. Through
Sep. 2020.
[27] I. Dongo, Y. Cadinale, A. Aguilera, F. Martínez, Y. Quintero, and his career he has developed software architecture
S. Barrios, ‘‘Web scraping versus Twitter API: A comparison for a credibil- skills, by enrolling in several courses related to
ity analysis,’’ in Proc. 22nd Int. Conf. Inf. Integr. Web-Based Appl. Services, that subject. His research interests include pro-
Nov. 2020, pp. 1–11. gramming languages design and machine learning,
[28] O. Goonetilleke, T. Sellis, X. Zhang, and S. Sathe, ‘‘Twitter analytics: A big specifically in the field of natural language pro-
data management perspective,’’ ACM SIGKDD Explor. Newslett., vol. 16, cessing.
no. 1, pp. 11–20, 2014.
[29] R. Giovanetti and L. Lancieri, ‘‘Model of computer architecture for online
social networks flexible data analysis: The case of Twitter data,’’ in Proc. DAVID CABEZA is currently pursuing the degree
IEEE/ACM Internat. Conf. Adv. Social Netw. Anal. Mining (ASONAM), with Universidad Simón Bolívar. He is also a Com-
Aug. 2016, pp. 677–684.
puter Engineer with Universidad Simón Bolívar.
[30] M. Fu, A. Agrawal, A. Floratou, B. Graham, A. Jorgensen, M. Li, N. Lu,
K. Ramasamy, S. Rao, and C. Wang, ‘‘Twitter heron: Towards extensible Throughout his career, he has developed skills in
streaming engines,’’ in Proc. IEEE 33rd Int. Conf. Data Eng. (ICDE), algorithms, software engineering, among others.
Apr. 2017, pp. 1165–1172. His research interests include conducting research
[31] J. Jarrett, K. Hemmings-Jarrett, and M. B. Blake, ‘‘Towards a service- in cryptography, software engineering, and the
oriented architecture for pre-processing crowd-sourced sentiment from intersection between business, technology, and
Twitter,’’ in Proc. IEEE Int. Conf. Web Services (ICWS), Jul. 2019, user experience.
pp. 163–171.
[32] Y. Namihira, N. Segawa, Y. Ikegami, K. Kawai, T. Kawabe, and S. Tsuruta,
‘‘High precision credibility analysis of information on Twitter,’’ in ANA AGUILERA received the B.S. degree in com-
Proc. Int. Conf. Signal-Image Technol. Internet-Based Syst., Dec. 2013, puter science engineering from Lisandro Alvarado
pp. 909–915. West-Central University (UCLA), Barquisimeto,
[33] C. Shao, G. L. Ciampaglia, A. Flammini, and F. Menczer, ‘‘Hoaxy:
Venezuela, in 1994, the B.Sc. degree (Hons.)
A platform for tracking online misinformation,’’ in Proc. 25th Int. Conf.
in computer engineering from the University of
Companion World Wide Web, 2016, pp. 745–750.
[34] T. Hamdi, H. Slimi, I. Bounhas, and Y. Slimani, ‘‘A hybrid approach for Rennes I, Rennes, France, the M.S. degree in
fake news detection in Twitter based on user features and graph embed- computer science from Universidad Simón Bolí-
ding,’’ in Proc. Int. Conf. Distrib. Comput. Internet Technol. Bhubaneswar, var, Caracas, Venezuela, in 1998, and the Ph.D.
India: Springer, 2020, pp. 266–280. degree in medical informatics from the University
[35] I. Dongo, Y. Cardinale, and R. Chbeir, ‘‘RDF-F: RDF datatype inFerring of Rennes I, in 2008. She is currently a Full Pro-
framework,’’ Data Sci. Eng., vol. 3, no. 2, pp. 115–135, Jun. 2018. fessor with the Faculty of Engineering, Escuela de Ingeniería Informática,
University of Valparaíso, Valparaíso, Chile. Her research interests include
fuzzy databases, data mining, social networks, and medical informatics.
Since 2011, she has been a member of the Program Encouragement for
Research and Innovation Researcher (PEII) Level C, Venezuela. She is a
YUDITH CARDINALE received the degree member of the Venezuelan Association for the Advancement of Science
(Hons.) in computer engineering from Universidad (AsoVAC) and of the Venezuelan Computer Society (SVC). She received the
Centro-Occidental Lisandro Alvarado, Venezuela, Magna Cum Laude Award from UCLA and ‘‘Très honorable’’ Award in the
in 1990, and the M.Sc. and Ph.D. degrees in Ph.D. thesis from Rennes I. She was accredited in Program for Researcher
computer science from Universidad Simón Bolívar Promotion of Venezuela, Candidate Level, in 1998.
(USB), Venezuela, in 1993 and 2004, respectively.
She has been a Full Professor with the Computer SERGIO MEDINA received the bachelor’s degree
Science Department, USB, since 1996. She is in computer engineering from Universidad Simón
currently an Associate Researcher with the Univer- Bolívar (USB), Sartenejas, Caracas. He is cur-
sidad Católica San Pablo, Arequipa, Peru. She has rently a Software Developer with Rappi S.A.S,
written a range of scientific articles published in international journal, books, Bogotá, Colombia. His research interests include
and conferences, and has participated as a member of program committees of software architecture, event driven, and data inten-
several international conferences and journals. Her research interests include sive designs.
parallel processing, distributed object processing, operating systems, digital
ecosystems, high performance on grid and cloud platforms, collaborative
frameworks, and web services composition, including semantic web.

32516 VOLUME 9, 2021

You might also like