What Do We Really Need To Compute The Tie Strength An em - 2017 - Computer Comm
What Do We Really Need To Compute The Tie Strength An em - 2017 - Computer Comm
Computer Communications
journal homepage: www.elsevier.com/locate/comcom
a r t i c l e i n f o a b s t r a c t
Article history: Most of existing network-based decision-support systems, such as recommender systems, require know-
Received 13 October 2016 ing users’ social context and, thus, the strength of their interactions. However, previous studies related
Revised 29 May 2017
to the usage and estimation of tie strength either assume that this parameter is given or use a computa-
Accepted 3 June 2017
tional model of their own. The amount, variety and domain specific information required to apply these
Available online 3 July 2017
models makes the reproducing and reusing of existing results extremely costly or utterly impossible. In
Keywords: our research, we show empirically the relative importance of different social variables for the computa-
Tie Strength tion of the tie strength and propose a computational model independent of the Social Networks’ domain.
Relationship Modeling Our experiments are based on a dataset obtained from a survey that involved more than 100 participants
Social Media Networks and comprised more than 500 social ties. The dataset is the first publicly available dataset to explicitly
Network Applications include tie strength measures.
© 2017 Elsevier B.V. All rights reserved.
1. Introduction other users [9,19]. One of this papers’ goals is to study the potential
of using SNs to extract knowledge than can be used to compute tie
With the raising expansion of information technologies known strength. Different SNs provide their users with different technical
as Social Media (SM), our capacity to interact, collaborate and net- features to interact. Although we may find similar interaction facil-
work has highly and rapidly increased [1]. Research in a number ities among them or, at least, used for the same purposes, this fact
of academic fields has shown that SM can leverage the way many makes very difficult (and sometimes impossible) the task of ob-
problems are solved [2–5]. The main reason is that SM can offer taining all the social predictors required by the different existing
new insights and innovative means by targeting information more tie strength definitions [8–10,20] and, as a consequence, to devise
effectively [6]. Proof of this is the recent use of different social a general model to compute tie strength. A simple solution could
measures in decision-support systems, such as recommender sys- be directly asking users to rate the tie strength with their contacts
tems, where it has been proven that the use of SM information [7,21,22]. However, the tasks of tagging and rating are sometimes
along with some specific measures, like tie strength estimations, found tedious and can generate resentment [8,23], hence, decreas-
can be used to aid their users in decision-making processes [7,8]. It ing the systems’ usability. Besides, in the case of tools without a
is on this measure of tie strength -the importance of the social re- public interface or Big SNs, asking users to directly rate their tie
lationship between two individuals [9]- that this paper is focused strength with all their contacts is unaffordable or simply unreal-
on. istic, a fact that should be taken into account when designing tie
In the last decades, the academic interest on tie strength has strength estimation.
substantially grown both in model design [10–14] and in decision- When needing to compute tie strength other researchers have
support systems that use or could benefit from its computation, in given several different definitions according to their research do-
the area of recommender systems [8,15,16], fraud detection [17] or main, needs and access to the predictors that compose it. For ex-
viral marketing [18]. Social Network (SN) users post on their pro- ample, it has been affirmed that tie strength could be estimated by
files a huge amount of personal information (likes and interests, the communication reciprocity [24], by the possession of at least
photos, etc) that can be analyzed to compute their tie strength with one mutual friend [25], with recency of communication [26] or
with the interaction frequency [10,27]. This heterogeneous and un-
systematic definitions make the reutilization of others’ conclusions
∗
Corresponding author.
and/or models very difficult. Against this background, this paper
E-mail addresses: fl[email protected], [email protected] (F. Libera- aims to perform a thorough analysis of the social predictors that
tore), [email protected] (L. Quijano-Sanchez). can be used to compute this measure. Also, their importance and
http://dx.doi.org/10.1016/j.comcom.2017.06.001
0140-3664/© 2017 Elsevier B.V. All rights reserved.
60 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
their strength of association is studied, providing guidelines on timacy is defined as the state of being in a very personal or private
how to abstract their concept to ensure a feasible and satisfactory relationship (Webster’s dictionary); and concerning the reciprocal
computation of the tie strength. Hence, presenting a methodology services, the term “reciprocal” (of a pronoun) indicates that action
independent of the SN from which social factors can be estimated is given and received by each subject (Collins dictionary), that is,
and a set of conclusions that can be reused by other researches. actions carried out in common between two nodes in an SN.
Our aim is to propose a general model of tie strength that could With these four dimensions as a guide, Gilbert and Karahalios
be applied to most contexts. Besides, we provide a public dataset [10] identified 74 Facebook (FB) variables as potential predictors of
obtained from a survey that involved more than 100 participants tie strength. On the other hand, Burt [30] proposed that tie strength
and thoroughly analyzed more than 500 social ties. This is the first could be modeled by structural factors such as the network topol-
public dataset to explicitly include tie strength measures.1 We hope ogy or informal social circles. Xiang et al. [14] proposed an un-
that it will be a relevant contribution to researchers in the field supervised model to distinguish strong from weak ties based on
and encourage many to pursue further investigations in this sub- profile similarity and interaction activity. Lin et al. [31] stated that
ject matter. Finally, we show how the model proposed and the tie strength is mainly influenced by social distance, manifested by
insights drawn in the analysis can be used to obtain an estima- factors such as socioeconomic status, education level or political
tion of tie strength in a financial network comprised of clients of affiliation. Recently, Rodríguez et al. [11] have classified tie strength
a financial institution and their operations and relationships. The within four different types of social spheres computed through a
estimated strength of the tie between clients finds application in set of several factors extracted from FB and Twitter, while Arn-
Customer Relationship Management operations, such as identifying aboldi et al. [9] have presented quantitative linear models to esti-
influencers to recommend financial products. mate tie strength from a set of FB variables. Quijano-Sánchez et al.
In summary, the contributions of this paper are the following: [8] proposed a non-intrusive method to compute tie strength by au-
(1) Measuring the strength of association between the tie strength tomatically analyzing users’ FB profiles as opposed to other works
and several SN variables (Section 5.1). (2) Analyzing the relevance [33] that needed to explicitly ask for the data that conforms the tie
of the proposed variables by exploring different approaches to strength. They concluded that to move from theory [25,27,30,31] to
compute the tie strength and studying their estimation precision practice [8–10,20] it is important to note that the factors used to
(Section 5.3). (3) Testing other’s tie strength proposals, their appli- compute tie strength are not easy to quantify and are limited by the
cably, efficiency and limitations (Sections 2 and 5.4). (5) Introduc- capabilities of the API from which you extract them. Also, Hoss-
ing a practical example of how to reapply the results of this paper’s mann et al. [32] showed, through two datasets obtained from both
study in a financial network (Section 5.5). FB and Twitter, that tie strength is coupled with mobility and com-
The remainder of this paper is structured as follows. The next munication. In this line, Socievole et al. [13] performed an anal-
section shows some previous works related with our research ysis showing that, in general, FB variables are strongly related to
topic. A description of the research questions raised and answered tie strength. Finally, Pappalardo et al. [34] present a quantitative
in this piece of research is given in Section 3. Section 4 intro- measure of tie strength that, although it has not been validated
duces the details of the novel dataset. Next, in Section 5 different against real tie strength measures, represents a SN domain inde-
proposals of tie strength models and a variable analysis are pre- pendent approach. Albeit theoretically sound, their model needs as
sented. A comparison with other literature’s models is illustrated input social network variables such as the cardinality of the neigh-
in Section 5.4. The case study on a financial network is the subject borhood of all the actors involved or the dimension relevance [36],
of Section 5.5. Finally Section 6 concludes the paper with insights that due to privacy issues or domain restrictions may not be avail-
and future research guidelines. able in other designed applications or researches, therefore limit-
ing its practical applicability.
2. Literature review As illustrated in Tables 1 and 2, the heterogeneous, unsystem-
atic and domain dependent definitions of tie strength make the
The most widely regarded definitions of tie strength is Granovet- reutilization of others’ conclusions and/or models very difficult.
ter’s [27]: “The strength of a tie is a (probably linear) combination Petroczi et al. [20] affirmed that Granovetter’s four indicators are
of the amount of time, the emotional intensity, the intimacy (mu- the actual components of tie strength, whereas contextual contin-
tual confiding) and the reciprocal services which characterize the gencies (communication reciprocity [24], possessing at least one
tie.” This definition has provided a base for many studies that have mutual friend [25], recency of communication [26] or social dis-
made use of the concept [35] and has served as starting point for tance [31]) are predictors. Predictors are related to tie strength but
several researchers [10,12,20] . Thus, this research uses this seminal are not components of it. This paper focuses on the components
work as a baseline, analyzes each of these four components and and how to identify a SN domain independent predictor for each of
studies if the best way to compute the tie strength is indeed their them. Besides, although for the last 30 years many attempts have
linear combination. This starting point and not more recent pro- been made to find valid indicators and predictors of tie strength
posals [9–11] has been chosen following Petroczi et al.’s [20] jus- (see Table 2), Table 1 shows how most of these studies’ results
tification for the correctness of Granovetter’s approach and avoid- [13,14,24–26,30–33] are based on nominal data or binary indicators
ing unreproducible focuses, that are either unquantifiable models and, hence, they are not suitable for quantitative analysis. That is,
[13,14,24–26,30–33] or domain specific models [8–11,20]. As illus- most of the studies so far attempt to simply use and apply for-
trated in Table 1, our approach is the only quantitative model that mer knowledge on tie strength rather than try to actually mea-
does not have these limitations. sure these ties [37]. On the other hand, those that do propose
Regarding the four components described by Granovetter: the quantitative results [8–11,20,34] do not provide a concrete model
amount of time is a measurement of the duration of a tie be- or, in other cases, a way to quantify or abstract to other contexts
tween two nodes; the intensity is defined as the degree, amount of the predictors that they use. Also, these works providing quanti-
strength or force that something has (Webster’s dictionary); the in- tative results do not make publicly available their datasets. With-
out this data it is impossible to reproduce their results. Addition-
1
ally, these studies make use of specific SN attributes and, therefore,
Lewis et al. [28] provided a public dataset about FB users without any infor-
their tie strength computation can not be extrapolated to other net-
mation regarding the tie strength. Also, MIT Human Dynamics Lab provided a pub-
lic dataset [29] regarding mobile data and social dynamics of several communities, works. Hence, in order to unify tie strength definitions and avoid
again with no specific tie strength measures. the constant creation of new application specific models, this pa-
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 61
Table 1
Comparison of different tie strength research papers. In each column, a cross (X) identifies the articles
where: the model uses variables that are specific of FB and therefore not reusable in other domain; the
model uses variables that are specific of other SN (e.g. Twitter) or are specific of a concrete virtual com-
munity and therefore not reusable in other domains; the model is a quantitative approach to estimate tie
strength.
Granovetter [27] X
Burt [30] X
Lin et al. [31] X
Friedkin [24] X
Shi et al. [25]
Lin et al. [26] X
Hossmann et al. [32] X X
Socievole et al. [13]
Arnaboldi et al. [9] X X
Xiang et al. [14]
Rodríguez et al. [11] X X X
Gilbert and Karahalios [10] X X
Petroczi et al. [20] X X
Huszti et al. [33] X
Pappalardo et al. [34] X X
Quijano-Sánchez et al. [8] X X
Our goal X
per’s Section 5 presents an empirical study of which social pre- providing comparable results. This analysis allows the ranking
dictors are really crucial in the tie strength computation and their of the variables according to their strength of association with
correlation. It is important to note that the proposed tie strength the tie strength, thus identifying the best single predictor (RQ1).
predictors are context independent and, therefore, can be automat- 3. The model proposed by Granovetter is studied in
ically extrapolated from most SNs, as we later detail. Section 5.3 and its performance measured. This study provides
an answer to RQ3. Several models with different combinations
3. Research questions of Granovetter’s [27] tie strength components and their equiv-
alent SM predictors are then also tested. Thus, providing an
The goal of this paper is to answer the following research ques- answer to RQ2.
tions: 4. To understand the potential effect of including or dismissing
components, in Sections 5.3.2 and 5.3.3 the performance of
• RQ1: What are the components of the tie strength? several models with different variable configurations is tested.
• RQ2: What are good predictors for those components? Thus, answering (RQ4). Among these models, a Complete model
• RQ3: What is the performance of the tie strength model pro- that considers all the variables in the dataset is also studied.
posed by Granovetter in the context of SM? Additionally, a stepwise regression is run to study the relevance
• RQ4: In the context of SM, what is the effect of introducing of each variable in this Complete model.
other components than those proposed by Granovetter? 5. Section 5.3.4 presentes results on the effect of estimating tie
• RQ5: What is the benefit of considering a non-linear model of strength through non linear models, thus answering RQ5.
tie strength in the context of SM? 6. In Section 5.4, the performance of state-of-the-art tie strength
• RQ6: How effective are other state-of-the-art tie strength ap- approaches against this paper’s presented models is studied,
proaches? answering then RQ6.
• RQ7: How reproducible are the obtained conclusions about tie 7. Finally, in Section 5.5, the insights of this research’s proposed
strength in other domains? models are applied to compute tie strength measures between
Hence, the methodology adopted is detailed in the following: clients of a financial network. Thus, answering RQ7 by perform-
ing a case study that shows the applicability of this research
1. The first step of the research consists of a survey regarding conclusions in other domains.
friendships. A voluntary group of self-selected FB users an-
swered a questionnaire comprised of a set of questions regard- 4. Dataset description
ing the participant, and another set concerning their relation-
ship with five randomly selected FB friends.2 Most importantly, To answer our research questions, we have designed a ques-
participants provide an explicit evaluation of their tie strength tionnaire that assesses different friendship aspects in general and,
with the selected friends. Survey responses are then processed more specifically, the four components that Granovetter defines as
into a dataset that is used to answer the research questions. pillars of tie strength computation [27], as well as the predictors of
Section 4 presents the questionnaire and the dataset in detail. these indicators in FB. Note that the rest of FB features considered
2. Next, the strength of association between the variables ob- by other researchers in the tie strength computation [8–10,20] are
tained from the dataset and the tie strength is measured not evaluated as it is not currently possible to automatically extract
(Section 5.1). Both nominal and numerical variables are consid- them from user profiles.3 Besides, as applications outside FB can
ered, therefore we make use of a methodology that can be ap- not automatically obtain them, results derived from those studies
plied to both qualitative and quantitative variables, while still are only reusable for academic purposes in the context of those
2 3
We choose five friends as it is an affordable number of friends to answer the As of now, FB’s API only allows to extract information related to the user that
questionnaire about (five minutes each approx.) and also a number of friends most grants permission but not information related to the user’s friends in the network
users have. and their mutual interaction. Hence, studies such as [8,9,32] are no longer possible.
62 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
experiments, fact that does not meet our goal (to identify variables
as concepts that can be abstracted and then reused in further ap-
plications). FB has been chosen as a platform for our research for
the following reasons: (i) It is the most well-known and most used
SN4 and (ii) the API allows for random friendship selection. How-
ever, it is important to note that all the extracted variables are di-
rectly given by respondents and not extracted from the API. The
Comparison of different tie strength research papers. Each column represents whether the presented approach uses any of these variables (in order): intensity (or recency of communi-
cation), intimacy, friendship duration, reciprocal services, social distance (such as gender, relationship status, political and religious views), the network topology, number of common
X
X
X
X
of the tie strength directly from the source (respondents), (II) col-
Pictures
X
X
X
the practical implication that the tie strength could have in other
decision-making applications.
FB wall
X
X
X
boards and mailing lists. Subsequently, we extended the call for
No. of friends
X
X
of dummy variables, one for each level, with the exclusion of the
Social Distance
lowest that is used as the reference level. Note that, in the rest of
the paper, whenever referring to a nominal or ordinal variable, we
are actually referring to the corresponding set of dummy variables.
In the following, a description of the dataset5 is provided.
X
X
X
X
X
X
the relevance of the dataset and the correlation between this data
and the tie strength. Also, it allows to study whether the usage of
Duration
X
X
X
X
X
X
X
X
X
in other studies6 : skewed to the right and with the mode located
Pappalardo et al. [34]
Hossmann et al. [32]
4
Please refer to: http://sproutsocial.com/insights/new- social- media-
demographics/, last access on 2017/06/05 17:26:51
5
The questionnaire and the pseudonymized dataset can be found at http:
//portal.uc3m.es/portal/page/portal/ifibid/people/quijano/current_research/, last ac-
cess on 2017/06/05 17:26:51
6
See http://royal.pingdom.com/2012/08/21/report- social- network- demographics- in- 2012/,
last access on 2017/06/05 17:26:51.
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 63
Table 3
Personal Questions. The table presents the number of the question, the description (including the question type and the allowed re-
sponses), the name and the type of the associated variable.
# Text, Variable Type and Allowed Responses Variable Name Variable Type
Table 4
Friendship Questions. The table presents the number of the question, the description (including the question type and the allowed responses), the name
and the type of the associated variable(s).
# Text, Variable Type and Allowed Responses Variable Name Variable Type
(Q1) Friend’s age (approx if you do not know it or check on their FB page for this info). AgeFriend Continuous [0,1]
Open text, accepts values in [0, 100]
(Q2) Friend’s sex Multiple choice: (1) W, (0) M. SexFriend Nominal
(Q3) Rate your closeness to this person. The strength of your relationship/tie Scale from TieStrength Continuous [0,1]
0 (Weak, not very strong) to 10 (Very strong).
(Q4) What is your relationship with this person? Choose the most relevant option Intimacy Nominal
Multiple choice:(6) Spouse or partner, (5) Relative, (4) Close friend, (3) Friend, (2)
Co-worker, (1) Acquaintance, (0) Unknown.
(Q5) How often do you communicate with this person on Facebook (e.g., by private FBIntensity Ordinal
message, by commenting on the wall, by commenting pictures, by chat). Select the
most restrictive option Multiple choice: (6) More than once a day, (5) On a daily
basis, (4) On a weekly basis, (3) On a monthly basis, (2) Every 3 months, (1) Less
often than every 3 months, (0) Never.
(Q6) How often do you see each other? Select the most restrictive option Multiple RealIntensity Ordinal
choice: (6) More than once a day, (5) On a daily basis, (4) On a weekly basis, (3) On a
monthly basis, (2) Every 3 months, (1) Less often than every 3 months, (0) Never.
(Q7) How many common friends you have in FB (you can check this by looking at CommonFriends Continuous [0,1]
his/her profile and in the “friends” window located in the left hand side) Open
text, accepts values in [0, 50 0 0].
(Q8) How long have you known each other (in years, approx) Open text, accepts values RealDuration Continuous [0,1]
in [0, 100].
(Q9) How long have you been FB friends (in years, approx) (you can check this by FBDuration Continuous [0,1]
looking at his/her profile and in the “information” window located at the left hand
side) Multiple choice: 0, ..., 13.
(Q10) Select approximatively the percentage of common tastes that you share Scale from CommonTastes Continuous [0,1]
0 (0%) to 10 (100%).
(Q11) Do you have contact with this person in other SN? Check all that apply Whatsapp, Twitter, Instagram, Nominal
Checkboxes: Whatsapp, Twitter, Instagram, Flickr, Google +, Others, None. Flickr, Google, Others
(Q12) When doing a joint activity, how much do this person’s wishes influence your Influence Continuous [0,1]
choices? Scale from 0 (very little) to 10 (a lot).
(Q13) How much would you trust a movie recommendation from her/him? Scale from 0 Movie Continuous [0,1]
(very little) to 10 (a lot).
(Q14) How much would you trust a restaurant recommendation from her/him? Scale Restaurant Continuous [0,1]
from 0 (very little) to 10 (a lot).
(Q15) How much would you trust a FB app recommendation from her/him? Scale from 0 App Continuous [0,1]
(very little) to 10 (a lot).
(Q16) How much would you trust a recommendation from her/him for migrating to SN Continuous [0,1]
another SN? Scale from 0 (very little) to 10 (a lot).
(Q17) How much money would you loan to this person if asked? Multiple choice: (0) Loaning Ordinal
Nothing, (1) 10 euros/pounds/dollars or less, (2) 100 euros/pounds/dollars or less, (3)
1000 euros/pounds/dollars or less, (4) 10,000 euros/pounds/dollars or less, (5) more
than 10,000 euros/pounds/dollars.
the respondent and the friend (for the questions, please refer to 4.2.2. Tie strength variable
Table 4). Answers are reflected in the following variables that can TieStrength (Q3), reflects the real tie strength between users. This
be roughly divided in 4 subgroups according to their function: is the dependent variable used in this paper’s prediction model
study. This variable’s distribution, shown in Fig. 1b, presents a
shape that resembles the one of the corresponding variable in
Gilbert and Karahalios [10] (“How strong is your relationship?”).
4.2.1. Friend description variables
AgeFriend, SexFriend (Q1 and Q2, respectively). These variables 4.2.3. Friendship descriptors variables
provide some personal demographic information on the friend Intimacy, FBIntensity, RealIntensity, CommonFriends, RealDura-
considered. tion, FBDuration, CommonTastes, Whatsapp, Twitter, Instagram, Flickr,
64 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
Table 5 Table 7
Eigenvalues, percentage of variance and cumulative percentage of Variables’ contributions in percentage in the hidden factors.
variance of the hidden factors.
Dim.1 Dim.2 Dim.3 Dim.4
Dim.1 Dim.2 Dim.3 Dim.4
AgeRespondent 10.2771 14.3600 0.1692 2.4075
Variance 3.8769 2.6875 1.9840 1.8157 FBFriends 4.0106 5.0090 1.3677 4.0850
% of var. 9.6923 6.7187 4.9601 4.5393 Character 0.2121 0.2392 1.7846 8.4971
Cumulative % of var. 9.6923 16.4109 21.3710 25.9103 FBUse 0.1041 1.9577 3.9752 6.6141
AgeFriend 10.2771 14.3600 0.1692 2.4075
CommonFriends 0.4951 9.8320 2.5898 0.5206
Table 6 RealDuration 10.7422 3.4616 0.0305 0.6083
Variables’ coordinates in the hidden factors. FBDuration 0.1209 0.0666 0.2130 0.5067
CommonTastes 9.7114 5.3657 0.3825 0.1181
Dim.1 Dim.2 Dim.3 Dim.4
CountryRespondent 2.1235 3.1703 5.6468 13.6603
AgeRespondent 0.3984 0.3859 0.0034 0.0437 SexRespondent 0.1795 0.2468 2.8448 4.6236
FBFriends 0.1555 0.1346 0.0271 0.0742 SexFriend 0.1506 0.0117 1.3336 0.8094
Character 0.0082 0.0064 0.0354 0.1543 Intimacy 15.1178 8.7517 21.9054 15.9041
FBUse 0.0040 0.0526 0.0789 0.1201 FBIntensity 14.7896 9.3052 22.8691 13.2172
AgeFriend 0.3984 0.3859 0.0034 0.0437 RealIntensity 12.2552 7.7369 28.5507 14.6200
CommonFriends 0.0192 0.2642 0.0514 0.0095 Whatsapp 7.3554 7.4295 2.5152 1.3193
RealDuration 0.4165 0.0930 0.0 0 06 0.0110 Twitter 0.3062 2.6383 0.4682 2.6255
FBDuration 0.0047 0.0018 0.0042 0.0092 Instagram 0.4476 5.5239 3.1220 1.3153
CommonTastes 0.3765 0.1442 0.0076 0.0021 Flickr 0.0048 0.1070 0.0062 0.0658
CountryRespondent 0.0823 0.0852 0.1120 0.2480 Google 0.0706 0.0568 0.0534 1.4359
SexRespondent 0.0070 0.0066 0.0564 0.0840 Others 1.2486 0.3700 0.0028 4.6391
SexFriend 0.0058 0.0 0 03 0.0265 0.0147
Intimacy 0.5861 0.2352 0.4346 0.2888
FBIntensity 0.5734 0.2501 0.4537 0.2400
RealIntensity 0.4751 0.2079 0.5665 0.2655 hidden factors, respectively. Higher values in Table 7 correspond to
Whatsapp 0.2852 0.1997 0.0499 0.0240 variables having a greater importance in the definition of the fac-
Twitter 0.0119 0.0709 0.0093 0.0477 tor. In summary, the first four hidden dimensions are characterized
Instagram 0.0174 0.1485 0.0619 0.0239
Flickr 0.0 0 02 0.0029 0.0 0 01 0.0012
by the following variables:
Google 0.0027 0.0015 0.0011 0.0261
• Factor 1: Intimacy, FBIntensity, RealIntensity, RealDuration, Age-
Others 0.0484 0.0099 0.0 0 01 0.0842
Friend, AgeRespondent, CommonTastes. This factor is concerned
with long, intense and intimate relationships between older
users that share many interests.
ple, RealIntensity includes the following set of variables introduced • Factor 2: AgeFriend, AgeRespondent. This factor regards mostly
by Arnaboldi et al. [9]: “Number of alters’ pictures in which ego
friendships between older users.
appears,” “Number of ego’s pictures in which alters appear” and • Factor 3: RealIntensity, FBIntensity, Intimacy. The focus of this
“Number of events in common”.
factor is on very intense and intimate relationships.
Regarding the research by Hossmann et al. [32], their results are • Factor 4: Intimacy, RealIntensity, CountryRespondent, FBInten-
not directly comparable due to the differences in experiment and
sity. This factor concerns intimate and intense relationships and
variable design. However, similar conclusions are reached: Hoss-
discriminates on the country of origin of the respondent.
mann et al. [32] detect that tie strength is correlated to commu-
nication and physical closeness, which relates to the high correla- The factors identified by the FAMD present several overlaps.
tion between FBIntensity and RealIntensity with tie strength shown For instance, FBIntensity and RealIntensity have a strong effect on
in this analysis. factors 1, 2, and 3. Given the lack of independence between the
hidden factors, we can conclude that they cannot explain the tie
5.2. Factor analysis strength dimensions hypothesized by Granovetter. However, this
does not mean that the factor description given by Granovetter
To analyze the nature of the variables in the dataset, a factor may not be accurate for tie strength estimation (this is the ob-
analysis has been carried out on the data. The objective of this ject of the analysis carried out in Section 5.3). What it does say
analysis is to ascertain whether or not the factors that compose is that Granovetter’s factors cannot fully characterize a friendship
the variables are able to explain the tie strength dimensions hy- relationship. In fact, the objective of Factor Analysis is to describe
pothesized by Granovetter. Due to the presence of both qualitative variability in the data in terms of unobserved variables. Thus, the
and quantitative variables, Factor Analysis of Mixed Data (FAMD) characteristics of a friendship are determined by more elements
[41] has been applied. In fact, traditional Factor Analysis (or Prin- than intimacy, intensity, common tastes, and duration. However,
cipal Component Analysis) could not be applied as it requires all the first factor identified by the analysis (i.e., the factor having the
the variables to be quantitative. Similarly, Multiple Correspondence highest explained variability) includes exactly Granovetter’s factors,
Analysis is used for qualitative variables. On the other hand, FAMD plus both the respondent and the friend’s ages. Therefore, this sug-
can be applied when quantitative and qualitative variables are si- gests that the characteristics of a friendship are, indeed, principally
multaneously present as active elements. For the sake of this anal- determined by the tie strength (as defined by Granovetter) and the
ysis, only the predictive variables (user and friendship) have been age of the persons involved. This is a proof of the primary role of
considered. tie strength in the definition of friendships and relationships be-
The first four factors (or dimensions) identified by the method tween people.
are studied. Table 5 illustrates the variance associated to each di-
mension, as well as the percentage of variance and the cumulative 5.3. Linear model
percentage of variance. The total variance explained by the four
factors is very low, 25% approx., suggesting that they are not rep- As mentioned before, Granovetter [27] proposed that the tie
resentative of the whole dataset. Tables 6 and 7 show the coor- strength is a linear combination of four factors: the amount of time,
dinates and the contribution in percentage of the variables in the the emotional intensity, the intimacy (mutual confiding) and the
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 67
Linear models’ beta coefficients. The symbols between the brackets next to the beta values represent the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.0 01; (∗ ∗ ) 0.0 01
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
expressed as:
(∗ )
()
()
()
p
Coefficient Value
Y = β0 + βi Xi (1)
−0.0076339551
−0.0475174238
0.0586062806
0.0336837256
0.2514905068
0.1053849084
0.1352654431
0.4801696186
0.4684718135
0.4121393382
0.1893072821
0.5712157603
0.0911373613
0.1113889351
0.165236017
i=1
CommonTastes
FBIntensity1
FBIntensity2
FBIntensity3
FBIntensity4
FBIntensity5
FBIntensity6
remark that, as in the case of the strength of association analysis
FBDuration
(Intercept)
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Intimacy1
(Section 5.1), the nominal and the ordinal variables are replaced by
Model 4
Variable
the corresponding set of dummies. The following models are con-
sidered and analyzed:
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ )
(∗ )
X=(FBDuration, RealIntensity, Intimacy, CommonTastes).
()
()
()
2.
3. X=(RealDuration, FBIntensity, Intimacy, CommonTastes).
Coefficient Value
−0.0236147212
X=(FBDuration, FBIntensity, Intimacy, CommonTastes).
0.4531563843
0.0577809773
0.5621640378
0.1534975292
0.0829754184
0.4409749215
0.1098295621
0.1100556056
0.1782106087
4.
0.4161323728
0.0241130337
0.1295191388
0.237625435
0.114200989
In Model 1 the known variables provided by the respondent
are used as proxies of the factors identified by Granovetter. In the
other models, some of these variables are substituted by their FB
proxy. This allows us to assess the impact of surrogating the vari-
CommonTastes
ables with measurements drawn from a given SN in general and
RealDuration
FBIntensity1
FBIntensity2
FBIntensity3
FBIntensity4
FBIntensity5
FBIntensity6
(Intercept)
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Intimacy1
Intimacy2
from FB in particular.
Model 3
Variable
Table 8 presents the linear models’ beta coefficients and
their significance represented by the number of asterisks next to
each beta coefficient value, as explained in the table’s legend.
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
Intimacy1–Intimacy6 and Intensity1–Intensity6 refer to the dummy
()
()
()
()
variables representing the level specified by the number in the
Coefficient Value
0.0532583283
0.2609923324
0.3809810828
0.1840 0 03987
0.2543776789
0.0420821607
0.3880057217
0.4186597779
0.1168586366
0.1847872521
0.286301221
0.416785887
0.165428154
Mean Square Error), computed on a model estimated using all the
observations, the validation RMSE, computed by applying leave-
one-out crossvalidation on the observations, and the R2 . Table 9
shows these values for the models presented so far. From the ta-
CommonTastes
bles a number of considerations can be drawn:
≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
FBDuration
(Intercept)
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Variable
(∗ ∗ ∗ )
0.0424881889
0.3662260574
0.2862918495
0.4091906799
0.4217724309
0.1748226203
0.1797162256
0.1618258431
0.2514224171
0.1125199331
0.355364953
0.244148771
Variable
- For all the linear models, the training and the validation RMSEs
are very similar suggesting that there is no overfitting in the
models.
- The surrogate FB variables provide a good approximation to the
real variables. In fact, in Model 4 the difference in terms of
training and validation RMSE is smaller than 0.01 (a 6.73% vali-
dation RMSE gap approx.). This result suggests that SNs can in-
deed provide information on the users that can then be used as
68 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
Table 9
Linear models’ training RMSE, validation RMSE, and R2 .
Table 10
95% confidence intervals for the beta coefficients of
Model 1.
proxy for the real values. This agrees with the results presented
by Socievole et al. [13].
- In general, the duration variables are the less significant.7
This agrees with the results of the association analysis (see
Section 5.1).
- However, if tie strength had to be linearly predicted with just Fig. 3. 95% confidence intervals for the variables Intimacy and RealIntensity in
one variable then, the most associated one should be chosen, Model 1. In the figures, the solid dots represent the coefficient value while the
which in our particular case (see Section 5.1) would be Intimacy empty circles define the confidence interval limits.
followed by CommonTastes.
Table 11
Reduced models’ training RMSE, validation RMSE, and R2 .
5.3.1. Beta coefficient’s confidence intervals
Model Training RMSE Validation RMSE R2
Some interesting insights can be drawn by studying the confi-
dence intervals associated to the beta coefficients in Model 1 (see Model R1 0.1425947974 0.14601552 0.8018588844
Model R2 0.157154153 0.1598006565 0.7593316572
Table 10). Fig. 3 provides a graphical representation. By observ-
Model R3 0.1751020714 0.1782862266 0.7012211783
ing the table and the figures, it can be seen that the levels of the Model R4 0.1675015779 0.1712608702 0.7265958844
variable Intimacy could be aggregated in two independent groups,
namely {(6) Spouse or partner, (5) Relative, (4) Close friend} de-
noting a strong relationship, and {(3) Friend, (2) Co-worker, (1) Ac-
R2. X=(RealDuration, Intimacy, CommonTastes).
quaintance, (0) Unknown} that represent relationships that have a
R3. X=(RealDuration, RealIntensity, CommonTastes).
low level of tie strength. As a consequence, if too costly to obtain,
R4. X=(RealDuration, RealIntensity, Intimacy).
researchers could narrow down the Intimacy variable to two di-
mensions, i.e. high and low. On the other hand, it is not possible Each reduced model presents only three of the four variables.
to clearly cluster the levels of the variable RealIntensity in different The results of the experiments are illustrated in Tables 11 and
groups as their confidence intervals overlap. 12. According to Table 11, the performance of the linear model
decreases the most when Intimacy and CommonTastes are not in-
5.3.2. Testing for variable elimination cluded in the linear model (gap of 20% approx.). In fact, the valida-
The goal of the next study is to understand the impact of disre- tion RMSEs jump to 0.18 and 0.17, respectively, corresponding to a
garding a variable when computing the tie strength. Since obtaining validation RMSE gap of 22.6% and 17.8%, respectively. On the other
the variables’ (or the proxies’) value could be costly (or even im- hand, disregarding RealDuration does not have a major impact on
possible), this analysis helps a decision-maker to choose how to the performance of the linear model (validation RMSE gap of 0.4%
invest limited resources in data collection operations. The follow- approx.), suggesting that this variable could be removed if too ex-
ing reduced models are analyzed: pensive to obtain. These results agree with the insights drawn in
the Strength of Association Analysis (see Section 5.1).
R1. X=(RealIntensity, Intimacy, CommonTastes).
5.3.3. Variable selection
7
Model Complete, includes all the variables obtained from the
In preliminary experiments where the duration variables have been represented
as the ratios to the age of the respondent have shown that these derived variables
survey. Table 13 illustrates the beta coefficients and the relative
have a lower strength of association with the TieStrength than the originals. These significance of the variables. According to these results the vari-
results have not been reported for the sake of brevity. ables can be classified in order of importance. The most relevant
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 69
Table 13
Beta coefficients of the linear model including all the variables obtained from
the survey. The symbols between the brackets next to the beta values represent
the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.001; (∗ ∗ )
0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.
(Intercept) −0.0502265304 ()
Reduced linear models’ beta coefficients. The symbols between the brackets next to the beta values represent the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.001;
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
CountryRespondentItaly 0.0194899591 ()
(.)
()
() CountryRespondentSouth Korea 0.1218674273 ()
Coefficient Value
CountryRespondentSpain 0.0042439897 ()
0.4006302956
0.2358069934
0.0663746706
0.2302142843
0.3142388875
0.1500463358
0.5632358151
0.1524363787
0.0252177245
0.1575637923
0.3136319966
0.5113639851
0.590353641
0.300870918
CountryRespondentUK −0.0065363919 ()
SexRespondent1 0.0095698912 ()
AgeRespondent −0.1815949016 (.)
FBFriends −0.1120226078 ()
Character 0.0648879392 (∗ )
FBUse 0.0562825939 (∗ )
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
AgeFriend 8.31657995645122E−005 ()
RealDuration
(Intercept)
SexFriend1 0.0013402734 ()
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Model R4
Variable
Intimacy1 −0.0274997999 ()
Intimacy2 0.0050127256 ()
Intimacy3 0.1343400635 (∗ ∗ ∗ )
Intimacy4 0.2973986037 (∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
Intimacy5 0.3072834803
Intimacy6 0.3232628227 (∗ ∗ ∗ )
FBIntensity1 0.0159442477 ()
Coefficient Value
−0.0946689079
FBIntensity2 0.0309682562 ()
0.5287046923
0.6248044372
0.5658882361
0.2843779729
0.2887962137
0.3735913069
0.1638867328
0.3916713081
FBIntensity3 0.0212767968 ()
FBIntensity4 0.0418982028 ()
FBIntensity5 0.0454347325 ()
FBIntensity6 0.0 0 08603143 ()
RealIntensity1 0.090100251 (∗ ∗ ∗ )
RealIntensity2 0.1412422451 (∗ ∗ ∗ )
CommonTastes
(∗ ∗ ∗ )
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
RealIntensity1
RealIntensity3 0.1324917828
RealDuration
RealIntensity4 0.2027933576 (∗ ∗ ∗ )
(Intercept)
Model R3
RealIntensity5 0.2232402106 (∗ ∗ ∗ )
Variable
RealIntensity6 0.3070519108 (∗ ∗ ∗ )
CommonFriends 0.1521862564 ()
RealDuration 0.2622468711 (∗ ∗ )
FBDuration 0.0150863664 ()
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ )
(∗ )
CommonTastes 0.357199348
()
()
Whatsapp 0.09002879 (∗ ∗ ∗ )
Coefficient Value
0.4758944403
0.0998526739
0.2610037082
0.0168288373
0.4744749963
0.1762884285
0.4897159218
Instagram −0.0209596998 ()
0.616653027
Flickr −0.1510158076 ()
Google 0.0153396018 ()
(∗ ∗ ) 0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.
Others 0.0768023769 (∗ ∗ )
CommonTastes
RealDuration
Variable
−0.0 0 08167792
−0.0287693977
0.4200209107
0.0487261272
0.1928929781
0.1782487253
0.3945361107
0.2511310036
0.1591292133
0.395644463
0.113062614
participant and the friend also use social networks other than the
most popular to stay in touch) comes initially as a surprise. How-
ever, thanks to this result an interesting conclusion can be drawn:
if two people use an uncommon social network to communicate,
they probably share a special bond. This conclusion requires fur-
CommonTastes
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
Table 14 Table 15
Beta coefficients of the stepwise-selected linear Non linear models’ training RMSE and validation RMSE.
model. The symbols between the brackets next
to the beta values represent the significative- (a) Models including Granovetter’s factors. The non linear models are compared
ness of the variable in the model. Legend: (∗ ∗ ∗ ) to Model 1. The best RMSE are highlighted in bold.
p < 0.001; (∗ ∗ ) 0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < Model Training RMSE Validation RMSE
0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.
GAMS 1 0.141780 0.145434
Variable Coefficient Value RF 1 0.1197373 0.1500345
SVM 1 0.133752 0.1497631
(Intercept) 0.068505009 (.)
Model 1 0.141780 0.145434
Character 0.051745604 (.)
CommonTastes 0.360727604 (∗ ∗ ∗ ) (b) Models including all the variables obtained from the survey. The non linear
FBUse 0.068407339 (∗ ∗ ∗ ) models are compared to Model Complete and Model Stepwise. The best RMSE
Intimacy1 −0.024458423 () are highlighted in bold.
Intimacy2 0.017839837 () Model Training RMSE Validation RMSE
Intimacy3 0.14534474 (∗ ∗ ∗ ) GAMS Complete 0.1319776 0.1404617
Intimacy4 0.310844222 (∗ ∗ ∗ ) RF Complete 0.0650688 0.1409619
Intimacy5 0.322514495 (∗ ∗ ∗ ) SVM Complete 0.1087886 0.1576382
Intimacy6 0.351332087 (∗ ∗ ∗ ) Model Complete 0.132021 0.142857
Others 0.074617638 (∗ ∗ ∗ ) Model Stepwise 0.1337331 0.138569
RealDuration 0.135811941 (∗ ∗ )
RealIntensity1 0.223615582 (∗ ∗ ∗ )
RealIntensity2 −0.019782306
RealIntensity3 0.0392605 In this analysis, the three non linear models considered are
RealIntensity4 −0.018611744
tested on two different configurations: (I) the factors defined by
RealIntensity5 0.0 01670 093
RealIntensity6 0.022639923 Granovetter, corresponding to Model 1; (II) all the variables ob-
Twitter 0.056447775 (.) tained from the survey, corresponding to Models Complete and
Whatsapp 0.089545628 (∗ ∗ ∗ ) Stepwise. Then, their results are compared to those of the corre-
sponding linear models, as shown in Table 15.
In the table, the best RMSE are highlighted in bold and, as per
considered in this research. In terms of error, the Training RMSE the other analysis in this research, the Validation RMSEs are cal-
and Validation RMSE of Stepwise are 0.1337331 and 0.138569, re- culated by Leave-One-Out Cross-Validation. It can be observed that
spectively, and R2 = 0.8257. Compared to Model Complete, Stepwise RF provides the best Training RMSE. However, the best Validation
has a higher Training RMSE but a lower Validation RMSE, although RMSE are obtained by linear models, namely Model 1 for the con-
only by a small fraction. figuration with the four Granovetter factors and Model Stepwise for
the configuration with all the variables. These results show that,
5.3.4. Non linear model considerations for the data considered, allowing local nonlinearity in the predic-
We now explore the effect of including nonlinearities in the tors or considering nonlinear prediction functions does not result
definition of the tie strength. In particular, we analyze the effect of in an improvement of the estimation, supporting the proposal of
(a) considering local nonlinearities in the predictors and (b) con- Granovetter that tie strength can be indeed explained as a linear
sidering models other than regression that are non-linear in na- combination of factors.
ture. These objectives are achieved by (a) studying Generalized Ad- Finally, Fig. 4 illustrates the smooth functions identified by
ditive Models (GAMs), and (b) Random Forsets (RFs) and Support the model GAMS Complete. The variables discarded by the model
Vector Machines (SVMs). are those that present a constant smooth function taking value
GAMs are generalized linear models in which the predicted zero: AgeRespondent, FBFriends, AgeFriend, FBIntensity, Common-
variable depends linearly on unknown smooth functions of some Friends, and FBDuration. Also, by observing the plots, it can be
factors, and thus are capable of identifying local non-linearities at seen that only one variable has a very limited non-linear rela-
factor level. Their generalized formulation is: tionship with the tie strength: FBUse. According to the graph, this
variable reaches a maximum in 0.6, approximatively, meaning that
g(E(Y )) = β0 + f1 (x1 ) + f2 (x2 ) + · · · + fm (xm ) (2)
people that make a moderate use of FB tend to have stronger ties
In the GAMs considered in this study, g() is the identity func- with their FB friends compared to people that use FB a lot or very
tion, and fi () are thin plate spline smooth functions for continu- little.
ous and ordinal factors and identity functions for the nominal vari-
ables. Feature selection in GAMs can be carried out by implement- 5.4. Comparison with other literature’s models
ing a smooth modification technique that penalizes smooth func-
tions having no wiggliness, thus effectively dropping the factors To understand this research’s usability, capabilities and limita-
that have no effect on the predicted variable. tions, the efficiency of several tie strength models proposed in the
RFs are an ensemble learning method that operate by con- literature is now tested.
structing a multitude of decision trees at training time and out- It has been suggested that the strength of a tie can be approx-
putting the mode (classification) or mean prediction (regression) imated by considering the intensity of the relationship between
of the individual trees. Decision trees partition the factor space ac- two individuals (features that represent recency of communica-
cording to value tests, therefore resulting in a non-linear classifica- tion [26] or the interaction frequency [10,27]). According to cor-
tion. Another advantage of using decision trees is that they auto- relation results shown in Fig. 2, the variable RealIntensity presents
matically perform feature selection. a strength of association with TieStrength equal to 0.64565, sug-
An SVM model represents observations as points in space and gesting that there exits a moderate linear relationship between
finds the hyperplane that separates them in the best possible way. the variables. Also, from Fig. 2 it can be easily seen that vari-
SVMs can efficiently perform a non-linear classification using what ables Intimacy and CommonTastes have a higher association. A sim-
is called the kernel trick, implicitly mapping the inputs into high- ple linear regression model built using RealIntensity and TieStrength
dimensional feature spaces. In this work, SVMs with radial basis as the independent and the dependent variables, respectively, re-
kernel are considered. sults in: Training RMSE = 0.2301, Validation RMSE = 0.23323, and
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 71
0.2
0.2
0.2
0.2
s(AgeRespondent,0)
s(Character,0.75)
s(FBFriends,0)
s(FBUse,2.58)
0.1
0.1
0.1
0.1
0.0
0.0
0.0
0.0
−0.2 −0.1
−0.2 −0.1
−0.2 −0.1
−0.2 −0.1
0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0
0.2
0.2
0.2
s(CommonFriends,0)
s(RealIntensity,1.01)
s(FBIntensity,0.26)
s(AgeFriend,0)
0.1
0.1
0.1
0.1
0.0
0.0
0.0
0.0
−0.2 −0.1
−0.2 −0.1
−0.2 −0.1
−0.2 −0.1
0.004 0.008 0.012 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.2
s(CommonTastes,1.49)
s(RealDuration,0.94)
s(FBDuration,0)
0.1
0.1
0.1
0.0
0.0
0.0
−0.2 −0.1
−0.2 −0.1
−0.2 −0.1
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Fig. 4. Smooth functions identified by model GAMS Complete. The value in the y-axis of each plot identifies the effect of the variable on the tie strength value.
R2 =0.478. These results prove that relying exclusively on the in- Friends is greater than or equal to one, and zero otherwise it can
tensity is not enough to effectively estimate the tie strength, as it be observed that with respect to our dataset there are 19 zeros and
results on average in a significant error. In fact, on a scale from 516 ones, and the association with TieStrength is only 0.04266. It is
zero to 10, the tie strength misestimation is expected to be of already clear that the variable OneFriend is unfitted to discriminate
more than 2 levels, on average. This goes along with other authors’ different levels of tie strength. The performance of a simple lin-
opinions [35], that claim that certain indicators such as frequency ear regression model built using OneFriend and TieStrength as the
of contact or duration are unnecessary in the tie strength analysis independent and the dependent variables, respectively, is: Train-
[35]. As for example, it can be misleading to qualify a relationship ing RMSE = 0.31975, Validation RMSE = 0.3213, and R2 =0.00182.
as “strong” if the high frequency of contact refers to neighbors or These results confirm our preliminary evaluation on the effective-
colleagues. ness of the variable built as suggested by Shi et al.
According to Marsden and Campbell [35], closeness is the only Other approaches, such as those proposed by Gilbert and Kara-
indicator which can determine the strength of relationship because halios [10], Arnaboldi et al. [9], Quijano-Sánchez et al. [8] and Ro-
it is independent of the predictors. The results presented in this re- dríguez et al. [11], use so many FB and/or Twitter features that
search disagree with Marsden and Campbell [35] statement as they their models are impossible to reproduce due to the current APIs
show that in the tie strength computation, including duration in- restrictions or reuse in other application contexts. For the same
tensity and more importantly common tastes predictors does add reason, the features proposed by Pappalardo et al. [34] (i.e., cardi-
some value. Besides, as we can see, building a simple linear regres- nality of the actors’ neighborhoods and dimension relevance [36])
sion model using Intimacy and TieStrength as the independent and could not be obtained in the context of this study. Therefore, it has
the dependent variables, respectively, results in: Training RMSE = not been possible to test their proposal against this paper’s, hence,
0.1904, Validation RMSE = 0.19257, and R2 = 0.643. These results pointing out reproducibility issues.
show that relying exclusively on the intimacy is not enough to es- As an additional note, the models presented by Arnaboldi et al.
timate the tie strength, as it leads to a significant error (two levels [9] display a larger error than those proposed in this research. Al-
on average, on a zero to ten scale). though the variables adopted in their work are not directly associ-
On the other hand, Shi et al. [25] suggest that the existence ated to the ones presented in this paper, there is a lack of variables
of a common friend between two individuals indicates the pres- representing the “intimacy” between the subjects, which as shown
ence of a strong tie between two individuals. After building the in Section 5.1 is this research’s our most significant variable while
dummy variable OneFriend, that takes value one when Common- theirs is recency of communication in FB. Similarly, in Huszti et al.
72 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
[33], Petroczi et al. [20] and Lin et al. [31] proposals the features in the time span considered. Note that, for a given r ∈ R, ni, j, r is
used in the tie strength computation are so specific of a concrete not necessarily symmetrical, e.g., i makes monthly transfers to j to
SN that their approach is not generalizable. pay the rent but the opposite is not true. An exception to this rule
This comparison study lets us conclude that the models pre- are the relationships identified as symmetrical in Table 16: in this
sented in this paper, indeed allow to estimate tie strength more ac- case, ni, j,r = n j,i,r , where r is a symmetrical relationship.
curately and more generally than those proposed so far in the lit- To address this problem, the tie strength is estimated as a linear
erature, mainly because the predictors we identify for each of the combination of three of the four components suggested: intimacy,
four tie strength components present the following characteristics: intensity and common tastes. Unfortunately, the time component
had to be disregarded since no information was given by the fi-
• They are independent of the SN domain.
nancial institution regarding the length of the relationship between
• They can be successfully represented with proxies that are easy
clients. As a consequence, Model R1 (see Section 5.3.2) can be ap-
to obtain.
plied by identifying valid proxies for its factors within the ESN. The
• They provide a lower tie strength estimation error.
proxies used in this case study are presented in the following.
Next, the presented insights and models are applied to a con-
crete tie strength implementation in a financial network. 5.5.1. Intimacy
In Model R1, variable Intimacy is defined as nominal and having
5.5. Case study: tie strength in a financial network seven levels, from zero to six. The degree of intimacy between i
and j depends on the type of relationship two actors share and the
An example of the application possibilities of this research is level of intimacy they reflect. A mapping of the different types of
now presented through a captation problem, where the goal of a relationships to corresponding levels of intimacy has been defined
first-rate Spanish financial institution is to acquire new clients and in joint collaboration with experts of the financial institution and
to promote a product between them. To solve this problem, the is illustrated in Table 16:
objective of identifying and quantifying tie strength relationships
between all actors in an Enterprise Social Network (ESN) [42] is INT (r ) → [0, 1] (3)
crucial as it helps to target marketing campaigns according to the where r ∈ R is a type of relationship. The intimacy between i and
idea that trusting relationships lead to greater knowledge exchange j is given by the most intimate relationship type that the actors
[40]. Thus, the immediate goal is to analyze tie strength relation- share:
ships in the ESN and recommend, among others, the sequence of
clients that should be contacted to successfully acquire a target of Intimacyi, j = max {INT (r )} (4)
r∈R:ni, j,r >0∨n j,i,r >0
interest [43].
However, in order to compute the tie strength between two
5.5.2. Intensity
nodes in this ESN a new problem is faced where, differently from
In Model R1, variable Intensity is defined as ordinal and having
previous studies [9,10,33] and mainly due to the network’s domain
seven levels, from zero to six. The frequency and volume of rela-
and size, it is impossible to obtain direct feedback from the net-
tionships between two users, ni, j, r , represents a valid proxy for the
work’s actors. That is, it is not possible to obtain real tie strength
intensity. Hence, the level of intensity between i and j is obtained
measures to compare and evaluate a designed model, nor can be
by rescaling the highest ni, j, r that two users share:
obtained more SN information through extra questionnaires than
those already in possession of the institution. Intensityi, j = round(6 · max ni, j,r , n j,i,r ) (5)
These limitations can be addressed applying the methodology r∈R
Table 16
Relationship types. For each type, the table shows a brief description, the corresponding level of
intimacy, and the reciprocity.
The authors would like to thank the editor and two anonymous
show that the analysis presented in this research can be easily reviewers for their constructive and insightful comments that im-
reapplied to different contexts and is extremely adaptable to dif- proved the quality of the paper.
ferent situations, depending on the availability of proxy variables
or real tie strength values. References
[1] M.D. Choudhury, H. Sundaram, A. John, D.D. Seligmann, Analyzing the dynam-
6. Conclusions and future work ics of communication in online social networks, in: Handbook of SN Technolo-
gies and Applications, 2010, pp. 59–94.
In this paper we have studied the viability of several tie strength [2] M. Jamali, M. Ester, Using a trust network to improve top-n recommenda-
tion, in: International Conference on Recommender Systems, RecSys ’09, 2009,
models and shown empirically that (1) they provide a lower esti- pp. 181–188.
mation error than other approaches presented in the literature, (2) [3] B.O. Holzbauer, B.K. Szymanski, T. Nguyen, A. Pentland, Social ties as predictors
they are more general and, as a consequence, (3) they are more of economic development, in: International Conference and School on Network
Science, Springer, 2016, pp. 178–185.
reusable. Our models verify Granovetter’s statement about the lin- [4] G.M. McGuire, W.T. Bielby, The variable effects of tie strength and social re-
earity of tie strength’s components: amount of time, intensity, inti- sources: how type of support matters, Work Occup. 43 (1) (2016) 3–74.
macy and reciprocal services. Besides, generic domain independent [5] H. Liang, K.-W. Fu, Network redundancy and information diffusion: the impacts
of information redundancy, similarity, and tie strength, Commun. Res. (2016)
and easily detachable predictors have been proposed for each com- 1–23.
ponent. Additionally, the research includes an explanatory analysis [6] J. Heidemann, M. Klier, F. Probst, Online social networks: A survey of a global
of the relevance of each component and of several variables that phenomenon, Comput. Netw. 56 (18) (2012) 3866–3878.
[7] J. Golbeck, Generating predictive movie recommendations from trust in social
could be related to the tie strength. To the best of the authors’
networks, in: International Conference on Trust Management, iTrust ’06, 2006,
knowledge, our explanatory analysis is the first one to study the pp. 93–104.
impact of each predictor on the tie strength estimation. Regard- [8] L. Quijano-Sánchez, B. Díaz-Agudo, J.A. Recio-García, Development of a group
ing knowledge acquisition, this research has outlined several in- recommender application in a social network, Knowledge Based Syst. 71 (2014)
72–85.
teresting conclusions related to the close association between tie [9] V. Arnaboldi, A. Guazzini, A. Passarella, Egocentric online social networks:
strength and general persuasiveness, the ability to borrow money Analysis of key features and prediction of tie strength in facebook, Comput.
and the ability to recommend items. This could serve as starting Commun. 36 (10-11) (2013) 1130–1144.
[10] E. Gilbert, K. Karahalios, Predicting tie strength with social media, in: Inter-
points for recommender systems, marketing campaigns or other national Conference on Human Factors in Computing Systems, CHI ’09, 2009,
researchers in general. Finally, an example of the applicability of pp. 211–220.
the presented insights and models in different domains has been [11] S.S. Rodríguez, R.P.D. Redondo, A.F. Vilas, Y. Blanco-Fernández, J.J.P. Arias, A tie
strength based model to socially-enhance applications and its enabling imple-
illustrated through a case study of a financial network comprised mentation: mysocialsphere, Expert Syst. Appl. 41 (5) (2014) 2582–2594.
of clients operations. [12] H.C. White, Identity and Control: How Social Formation Emerge, Princeton Uni-
Our tests show that the best estimation model proposed in this versity Pres, 2008.
[13] A. Socievole, F.D. Rango, A.C. Caputo, Opportunistic mobile social networks:
paper, Model Stepwise, is capable of approximating the tie strength
from mobility and facebook friendships to structural analysis of user social be-
evaluations provided by the participants with a low margin of er- havior, Comput. Commun. 87 (2016) 1–18.
ror (Validation RMSE of 0.1386, approx.). Despite the positive re- [14] R. Xiang, J. Neville, M. Rogati, Modeling relationship strength in online social
networks, in: International Conference on World Wide Web, WWW’10, 2010,
sults, a number of potential improvements are still possible. For
pp. 981–990.
instance, future research should study the impact of factors that [15] X.H. Jiliang Tang, H. Liu, Social recommendation: a review, Soc. Netw. Anal.
characterize in detail the structure of the network. Given the cur- Min. 3 (4) (2013) 1113–1133.
rent limitations imposed by commercial SN, these variables might [16] M.O. Nachawati, R. Rabbi, G.E. Yu, L. Kerschberg, A. Brodsky, Social sifter: an
agent-based recommender system to mine the social web, in: International
be difficult to obtain or extremely costly. An evaluation of the Conference on Semantic Technologies for Intelligence, Defense, and Security,
trade-off between this cost and the real usefulness of the variables STIDS’12, 2012, pp. 125–128.
74 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74
[17] J. Neville, Ö. Simsek, D.D. Jensen, J. Komoroske, K. Palmer, H.G. Goldberg, Us- [32] T. Hossmann, G. Nomikos, T. Spyropoulos, F. Legendre, Collection and analy-
ing relational knowledge discovery to prevent securities fraud, in: Interna- sis of multi-dimensional network data for opportunistic networking research,
tional Conference on Knowledge Discovery and Data Mining, SIGKDD’05, 2005, Comput. Commun. 35 (13) (2012) 1613–1625.
pp. 449–458. [33] E. Huszti, B. Dávid, K. Vajda, Strong tie, weak tie and in-betweens: a con-
[18] P. Domingos, M. Richardson, Mining the network value of customers, in: In- tinuous measure of tie strength based on contact diary datasets, in: Interna-
ternational Conference on Knowledge Discovery and Data Mining, SIGKDD’01, tional Conference on Applications of Social Network Analysis, ASNA ’13, 2013,
2001, pp. 57–66. pp. 38–61.
[19] R.I.M. Dunbar, V. Arnaboldi, M. Conti, A. Passarella, The structure of online so- [34] L. Pappalardo, G. Rossetti, D. Pedreschi, ”how well do we know each other?”
cial networks mirrors those in the offline world, Soc. Netw. 43 (2015) 39–47. detecting tie strength in multidimensional social networks, in: International
[20] A. Petroczi, T. Nepusz, F. Bazso, Measuring tie-strength in virtual social net- Conference on Advances in Social Networks Analysis and Mining, ASONAM’12,
works, Connections 27 (2) (2007) 39–52. 2012, pp. 1040–1045.
[21] A. Jøsang, R. Ismail, C. Boyd, A survey of trust and reputation systems for on- [35] P.V. Marsden, K.E. Campbell, Measuring tie strength, Soc. Forces 63 (2) (1984)
line service provision, Decis. Support Syst. 43 (2) (2007) 618–644. 482–501.
[22] P. Massa, P. Avesani, Trust-aware recommender systems, in: International Con- [36] M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi, Founda-
ference on Recommender Systems, RecSys ’07, 2007, pp. 17–24. tions of multidimensional network analysis, in: Advances in Social Networks
[23] W.-P. Lee, C. Kaoli, J.-Y. Huang, A smart tv system with body-gesture control, Analysis and Mining (ASONAM), 2011 International Conference on, IEEE, 2011,
tag-based rating and context-aware recommendation, Knowl. Based Syst. 56 pp. 485–489.
(2014) 167–178. [37] M.E. Walker, S. Wasserman, B. Wellman, Statistical models for social support
[24] N.E. Friedkin, Test of structural features of Granovetter’s strength of weak ties networks, in: Advances in Social Network Analysis: Research in the Social and
theory, 2, Soc. Netw., 1980. Behavioral Sciences, 1994, pp. 53–79.
[25] X. Shi, L. Adamic, M. Strauss, Networks of strong ties, Physica A 378 (1) (2007) [38] D. Liben-Nowell, J.M. Kleinberg, The link-prediction problem for social net-
33–47. works, JASIST 58 (7) (2007) 1019–1031.
[26] N. Lin, P. Dayton, P. Reenwald, Analyzing the instrumental use of relations in [39] B. Taskar, M.F. Wong, P. Abbeel, D. Koller, Link prediction in relational
the context of social structure, Sociol. Methods Res. 7 (2) (1978) 149–166. data, in: Advances in Neural Information Processing Systems, NIPS’03, 2003,
[27] M.S. Granovetter, The strength of weak ties, Am. J. Sociol. 78 (6) (1973) pp. 659–666.
1360–1380. [40] D.Z. Levin, R. Cross, L.C. Abrams, Why should I trust you? predictors of inter-
[28] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, N. Christakis, Tastes, ties, and personal trust in a knowledge transfer context, Acad. Manage. Meeting, 2002.
time: a new social network dataset using facebook.com, Soc. Netw. 30 (4) [41] J. Pagès, Analyse factorielle de données mixtes, Revue de statistique appliquée
(2008) 330–342. 52 (4) (2004) 93–111.
[29] A. Madan, M. Cebrian, S. Moturu, K. Farrahi, et al., Sensing the” health state” [42] K. Berger, J. Klier, M. Klier, A. Richter, ”who is key...?” - characterizing value
of a community, IEEE Pervasive Comput. 11 (4) (2012) 36–45. adding users in enterprise social networks, in: European Conference on Infor-
[30] R.S. Burt, Structural Holes: The Social Structure of Competition., Harvard Uni- mation Systems, ECIS ’14, 2014.
versity Press, 1992. [43] L. Quijano-Sanchez, F. Liberatore, The big chase: a decision support system for
[31] N. Lin, W.M. Ensel, J.C. Vaughn, Social resources and strength of ties: struc- client acquisition applied to financial networks, Decis. Support Syst. 98 (2017)
tural factors in occupational status attainment, Am. Sociol. Rev. 46 (4) (1981) 49–58.
393–405.