0% found this document useful (0 votes)
9 views

What Do We Really Need To Compute The Tie Strength An em - 2017 - Computer Comm

Uploaded by

tonytocamusic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

What Do We Really Need To Compute The Tie Strength An em - 2017 - Computer Comm

Uploaded by

tonytocamusic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Computer Communications 110 (2017) 59–74

Contents lists available at ScienceDirect

Computer Communications
journal homepage: www.elsevier.com/locate/comcom

What do we really need to compute the Tie Strength? An empirical


study applied to Social Networks
F. Liberatore∗, L. Quijano-Sanchez
UC3M-BS Institute of Financial Big Data, Universidad Carlos III de Madrid, Madrid, Spain

a r t i c l e i n f o a b s t r a c t

Article history: Most of existing network-based decision-support systems, such as recommender systems, require know-
Received 13 October 2016 ing users’ social context and, thus, the strength of their interactions. However, previous studies related
Revised 29 May 2017
to the usage and estimation of tie strength either assume that this parameter is given or use a computa-
Accepted 3 June 2017
tional model of their own. The amount, variety and domain specific information required to apply these
Available online 3 July 2017
models makes the reproducing and reusing of existing results extremely costly or utterly impossible. In
Keywords: our research, we show empirically the relative importance of different social variables for the computa-
Tie Strength tion of the tie strength and propose a computational model independent of the Social Networks’ domain.
Relationship Modeling Our experiments are based on a dataset obtained from a survey that involved more than 100 participants
Social Media Networks and comprised more than 500 social ties. The dataset is the first publicly available dataset to explicitly
Network Applications include tie strength measures.
© 2017 Elsevier B.V. All rights reserved.

1. Introduction other users [9,19]. One of this papers’ goals is to study the potential
of using SNs to extract knowledge than can be used to compute tie
With the raising expansion of information technologies known strength. Different SNs provide their users with different technical
as Social Media (SM), our capacity to interact, collaborate and net- features to interact. Although we may find similar interaction facil-
work has highly and rapidly increased [1]. Research in a number ities among them or, at least, used for the same purposes, this fact
of academic fields has shown that SM can leverage the way many makes very difficult (and sometimes impossible) the task of ob-
problems are solved [2–5]. The main reason is that SM can offer taining all the social predictors required by the different existing
new insights and innovative means by targeting information more tie strength definitions [8–10,20] and, as a consequence, to devise
effectively [6]. Proof of this is the recent use of different social a general model to compute tie strength. A simple solution could
measures in decision-support systems, such as recommender sys- be directly asking users to rate the tie strength with their contacts
tems, where it has been proven that the use of SM information [7,21,22]. However, the tasks of tagging and rating are sometimes
along with some specific measures, like tie strength estimations, found tedious and can generate resentment [8,23], hence, decreas-
can be used to aid their users in decision-making processes [7,8]. It ing the systems’ usability. Besides, in the case of tools without a
is on this measure of tie strength -the importance of the social re- public interface or Big SNs, asking users to directly rate their tie
lationship between two individuals [9]- that this paper is focused strength with all their contacts is unaffordable or simply unreal-
on. istic, a fact that should be taken into account when designing tie
In the last decades, the academic interest on tie strength has strength estimation.
substantially grown both in model design [10–14] and in decision- When needing to compute tie strength other researchers have
support systems that use or could benefit from its computation, in given several different definitions according to their research do-
the area of recommender systems [8,15,16], fraud detection [17] or main, needs and access to the predictors that compose it. For ex-
viral marketing [18]. Social Network (SN) users post on their pro- ample, it has been affirmed that tie strength could be estimated by
files a huge amount of personal information (likes and interests, the communication reciprocity [24], by the possession of at least
photos, etc) that can be analyzed to compute their tie strength with one mutual friend [25], with recency of communication [26] or
with the interaction frequency [10,27]. This heterogeneous and un-
systematic definitions make the reutilization of others’ conclusions

Corresponding author.
and/or models very difficult. Against this background, this paper
E-mail addresses: fl[email protected], [email protected] (F. Libera- aims to perform a thorough analysis of the social predictors that
tore), [email protected] (L. Quijano-Sanchez). can be used to compute this measure. Also, their importance and

http://dx.doi.org/10.1016/j.comcom.2017.06.001
0140-3664/© 2017 Elsevier B.V. All rights reserved.
60 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

their strength of association is studied, providing guidelines on timacy is defined as the state of being in a very personal or private
how to abstract their concept to ensure a feasible and satisfactory relationship (Webster’s dictionary); and concerning the reciprocal
computation of the tie strength. Hence, presenting a methodology services, the term “reciprocal” (of a pronoun) indicates that action
independent of the SN from which social factors can be estimated is given and received by each subject (Collins dictionary), that is,
and a set of conclusions that can be reused by other researches. actions carried out in common between two nodes in an SN.
Our aim is to propose a general model of tie strength that could With these four dimensions as a guide, Gilbert and Karahalios
be applied to most contexts. Besides, we provide a public dataset [10] identified 74 Facebook (FB) variables as potential predictors of
obtained from a survey that involved more than 100 participants tie strength. On the other hand, Burt [30] proposed that tie strength
and thoroughly analyzed more than 500 social ties. This is the first could be modeled by structural factors such as the network topol-
public dataset to explicitly include tie strength measures.1 We hope ogy or informal social circles. Xiang et al. [14] proposed an un-
that it will be a relevant contribution to researchers in the field supervised model to distinguish strong from weak ties based on
and encourage many to pursue further investigations in this sub- profile similarity and interaction activity. Lin et al. [31] stated that
ject matter. Finally, we show how the model proposed and the tie strength is mainly influenced by social distance, manifested by
insights drawn in the analysis can be used to obtain an estima- factors such as socioeconomic status, education level or political
tion of tie strength in a financial network comprised of clients of affiliation. Recently, Rodríguez et al. [11] have classified tie strength
a financial institution and their operations and relationships. The within four different types of social spheres computed through a
estimated strength of the tie between clients finds application in set of several factors extracted from FB and Twitter, while Arn-
Customer Relationship Management operations, such as identifying aboldi et al. [9] have presented quantitative linear models to esti-
influencers to recommend financial products. mate tie strength from a set of FB variables. Quijano-Sánchez et al.
In summary, the contributions of this paper are the following: [8] proposed a non-intrusive method to compute tie strength by au-
(1) Measuring the strength of association between the tie strength tomatically analyzing users’ FB profiles as opposed to other works
and several SN variables (Section 5.1). (2) Analyzing the relevance [33] that needed to explicitly ask for the data that conforms the tie
of the proposed variables by exploring different approaches to strength. They concluded that to move from theory [25,27,30,31] to
compute the tie strength and studying their estimation precision practice [8–10,20] it is important to note that the factors used to
(Section 5.3). (3) Testing other’s tie strength proposals, their appli- compute tie strength are not easy to quantify and are limited by the
cably, efficiency and limitations (Sections 2 and 5.4). (5) Introduc- capabilities of the API from which you extract them. Also, Hoss-
ing a practical example of how to reapply the results of this paper’s mann et al. [32] showed, through two datasets obtained from both
study in a financial network (Section 5.5). FB and Twitter, that tie strength is coupled with mobility and com-
The remainder of this paper is structured as follows. The next munication. In this line, Socievole et al. [13] performed an anal-
section shows some previous works related with our research ysis showing that, in general, FB variables are strongly related to
topic. A description of the research questions raised and answered tie strength. Finally, Pappalardo et al. [34] present a quantitative
in this piece of research is given in Section 3. Section 4 intro- measure of tie strength that, although it has not been validated
duces the details of the novel dataset. Next, in Section 5 different against real tie strength measures, represents a SN domain inde-
proposals of tie strength models and a variable analysis are pre- pendent approach. Albeit theoretically sound, their model needs as
sented. A comparison with other literature’s models is illustrated input social network variables such as the cardinality of the neigh-
in Section 5.4. The case study on a financial network is the subject borhood of all the actors involved or the dimension relevance [36],
of Section 5.5. Finally Section 6 concludes the paper with insights that due to privacy issues or domain restrictions may not be avail-
and future research guidelines. able in other designed applications or researches, therefore limit-
ing its practical applicability.
2. Literature review As illustrated in Tables 1 and 2, the heterogeneous, unsystem-
atic and domain dependent definitions of tie strength make the
The most widely regarded definitions of tie strength is Granovet- reutilization of others’ conclusions and/or models very difficult.
ter’s [27]: “The strength of a tie is a (probably linear) combination Petroczi et al. [20] affirmed that Granovetter’s four indicators are
of the amount of time, the emotional intensity, the intimacy (mu- the actual components of tie strength, whereas contextual contin-
tual confiding) and the reciprocal services which characterize the gencies (communication reciprocity [24], possessing at least one
tie.” This definition has provided a base for many studies that have mutual friend [25], recency of communication [26] or social dis-
made use of the concept [35] and has served as starting point for tance [31]) are predictors. Predictors are related to tie strength but
several researchers [10,12,20] . Thus, this research uses this seminal are not components of it. This paper focuses on the components
work as a baseline, analyzes each of these four components and and how to identify a SN domain independent predictor for each of
studies if the best way to compute the tie strength is indeed their them. Besides, although for the last 30 years many attempts have
linear combination. This starting point and not more recent pro- been made to find valid indicators and predictors of tie strength
posals [9–11] has been chosen following Petroczi et al.’s [20] jus- (see Table 2), Table 1 shows how most of these studies’ results
tification for the correctness of Granovetter’s approach and avoid- [13,14,24–26,30–33] are based on nominal data or binary indicators
ing unreproducible focuses, that are either unquantifiable models and, hence, they are not suitable for quantitative analysis. That is,
[13,14,24–26,30–33] or domain specific models [8–11,20]. As illus- most of the studies so far attempt to simply use and apply for-
trated in Table 1, our approach is the only quantitative model that mer knowledge on tie strength rather than try to actually mea-
does not have these limitations. sure these ties [37]. On the other hand, those that do propose
Regarding the four components described by Granovetter: the quantitative results [8–11,20,34] do not provide a concrete model
amount of time is a measurement of the duration of a tie be- or, in other cases, a way to quantify or abstract to other contexts
tween two nodes; the intensity is defined as the degree, amount of the predictors that they use. Also, these works providing quanti-
strength or force that something has (Webster’s dictionary); the in- tative results do not make publicly available their datasets. With-
out this data it is impossible to reproduce their results. Addition-
1
ally, these studies make use of specific SN attributes and, therefore,
Lewis et al. [28] provided a public dataset about FB users without any infor-
their tie strength computation can not be extrapolated to other net-
mation regarding the tie strength. Also, MIT Human Dynamics Lab provided a pub-
lic dataset [29] regarding mobile data and social dynamics of several communities, works. Hence, in order to unify tie strength definitions and avoid
again with no specific tie strength measures. the constant creation of new application specific models, this pa-
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 61

Table 1
Comparison of different tie strength research papers. In each column, a cross (X) identifies the articles
where: the model uses variables that are specific of FB and therefore not reusable in other domain; the
model uses variables that are specific of other SN (e.g. Twitter) or are specific of a concrete virtual com-
munity and therefore not reusable in other domains; the model is a quantitative approach to estimate tie
strength.

Paper Specific of FB Specific of other SN or domain Quantitative model

Granovetter [27] X
Burt [30] X
Lin et al. [31] X
Friedkin [24] X
Shi et al. [25]
Lin et al. [26] X
Hossmann et al. [32] X X
Socievole et al. [13]
Arnaboldi et al. [9] X X
Xiang et al. [14]
Rodríguez et al. [11] X X X
Gilbert and Karahalios [10] X X
Petroczi et al. [20] X X
Huszti et al. [33] X
Pappalardo et al. [34] X X
Quijano-Sánchez et al. [8] X X
Our goal X

per’s Section 5 presents an empirical study of which social pre- providing comparable results. This analysis allows the ranking
dictors are really crucial in the tie strength computation and their of the variables according to their strength of association with
correlation. It is important to note that the proposed tie strength the tie strength, thus identifying the best single predictor (RQ1).
predictors are context independent and, therefore, can be automat- 3. The model proposed by Granovetter is studied in
ically extrapolated from most SNs, as we later detail. Section 5.3 and its performance measured. This study provides
an answer to RQ3. Several models with different combinations
3. Research questions of Granovetter’s [27] tie strength components and their equiv-
alent SM predictors are then also tested. Thus, providing an
The goal of this paper is to answer the following research ques- answer to RQ2.
tions: 4. To understand the potential effect of including or dismissing
components, in Sections 5.3.2 and 5.3.3 the performance of
• RQ1: What are the components of the tie strength? several models with different variable configurations is tested.
• RQ2: What are good predictors for those components? Thus, answering (RQ4). Among these models, a Complete model
• RQ3: What is the performance of the tie strength model pro- that considers all the variables in the dataset is also studied.
posed by Granovetter in the context of SM? Additionally, a stepwise regression is run to study the relevance
• RQ4: In the context of SM, what is the effect of introducing of each variable in this Complete model.
other components than those proposed by Granovetter? 5. Section 5.3.4 presentes results on the effect of estimating tie
• RQ5: What is the benefit of considering a non-linear model of strength through non linear models, thus answering RQ5.
tie strength in the context of SM? 6. In Section 5.4, the performance of state-of-the-art tie strength
• RQ6: How effective are other state-of-the-art tie strength ap- approaches against this paper’s presented models is studied,
proaches? answering then RQ6.
• RQ7: How reproducible are the obtained conclusions about tie 7. Finally, in Section 5.5, the insights of this research’s proposed
strength in other domains? models are applied to compute tie strength measures between
Hence, the methodology adopted is detailed in the following: clients of a financial network. Thus, answering RQ7 by perform-
ing a case study that shows the applicability of this research
1. The first step of the research consists of a survey regarding conclusions in other domains.
friendships. A voluntary group of self-selected FB users an-
swered a questionnaire comprised of a set of questions regard- 4. Dataset description
ing the participant, and another set concerning their relation-
ship with five randomly selected FB friends.2 Most importantly, To answer our research questions, we have designed a ques-
participants provide an explicit evaluation of their tie strength tionnaire that assesses different friendship aspects in general and,
with the selected friends. Survey responses are then processed more specifically, the four components that Granovetter defines as
into a dataset that is used to answer the research questions. pillars of tie strength computation [27], as well as the predictors of
Section 4 presents the questionnaire and the dataset in detail. these indicators in FB. Note that the rest of FB features considered
2. Next, the strength of association between the variables ob- by other researchers in the tie strength computation [8–10,20] are
tained from the dataset and the tie strength is measured not evaluated as it is not currently possible to automatically extract
(Section 5.1). Both nominal and numerical variables are consid- them from user profiles.3 Besides, as applications outside FB can
ered, therefore we make use of a methodology that can be ap- not automatically obtain them, results derived from those studies
plied to both qualitative and quantitative variables, while still are only reusable for academic purposes in the context of those

2 3
We choose five friends as it is an affordable number of friends to answer the As of now, FB’s API only allows to extract information related to the user that
questionnaire about (five minutes each approx.) and also a number of friends most grants permission but not information related to the user’s friends in the network
users have. and their mutual interaction. Hence, studies such as [8,9,32] are no longer possible.
62 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

experiments, fact that does not meet our goal (to identify variables
as concepts that can be abstracted and then reused in further ap-
plications). FB has been chosen as a platform for our research for
the following reasons: (i) It is the most well-known and most used
SN4 and (ii) the API allows for random friendship selection. How-
ever, it is important to note that all the extracted variables are di-
rectly given by respondents and not extracted from the API. The
Comparison of different tie strength research papers. Each column represents whether the presented approach uses any of these variables (in order): intensity (or recency of communi-
cation), intimacy, friendship duration, reciprocal services, social distance (such as gender, relationship status, political and religious views), the network topology, number of common

complete questionnaire is presented in Tables 3 and 4. The goal


others

of the questionnaire is threefold: (I) obtaining a reliable measure

X
X
X

X
of the tie strength directly from the source (respondents), (II) col-
Pictures

lecting data about variables describing the friendship, that could


act as predictors for the tie strength, and (III) gathering data on
X

X
X

X
the practical implication that the tie strength could have in other
decision-making applications.
FB wall

Participants were initially recruited among the staff and stu-


dent population of the Universidad Carlos III de Madrid using poster
X

X
X

X
boards and mailing lists. Subsequently, we extended the call for
No. of friends

participation through chain messages posted via FB walls. The final


sample is a self-selected and voluntary group of 107 FB users. For
each respondent, five random FB friends were chosen by an app
implemented using the FB API. Each respondent answered seven
X

personal questions (User Questions, see Table 3) and, for each of


Network Topology

the randomly chosen friends, 17 questions regarding their relation-


friends, FB wall interaction, pictures tagged together, others (such as private inbox messages, or context of personal encounters).

ship (Friendship Questions, see Table 4). The numerical variables


are represented as continuous (0,1) variables by scaling them with
respect to the minimum and maximum values allowed. On the
other hand, ordinal and nominal variables are substituted by a set
X

X
X

of dummy variables, one for each level, with the exclusion of the
Social Distance

lowest that is used as the reference level. Note that, in the rest of
the paper, whenever referring to a nominal or ordinal variable, we
are actually referring to the corresponding set of dummy variables.
In the following, a description of the dataset5 is provided.
X

4.1. User questions


Reciprocal Services

The first part of the questionnaire (Table 3) contains questions


concerning the respondents. The aim of this part of the question-
naire is to obtain demographic data (CountryRespondent, SexRespon-
dent and AgeRespondent) about the respondents in order to study
X

X
X
X
X
X

the relevance of the dataset and the correlation between this data
and the tie strength. Also, it allows to study whether the usage of
Duration

FB (FBUse) affects the answers to the friendship questions. Finally,


it is used to measure whether the users own perception of their
X

personality (Character) is correlated to their tie strength values, to


Intimacy

their susceptibility to others’ opinions, or to their willingness to


loan money (again, see Section 5.1).
X

X
X

Regarding the collected data: respondents belong to six differ-


ent countries: Spain (71), Italy (32), Chile (1), South Korea (1), UK
Intensity

(1) and Dominican Republic (1). Genders are evenly distributed:


53 male and 54 female respondents, and 286 male and 249 fe-
X

X
X

X
X

X
X
X

male friends. Regarding the age distribution, Fig. 1a shows that it


presents similar characteristics to SN users’ distribution presented
Gilbert and Karahalios [10]

Quijano-Sánchez et al. [8]

in other studies6 : skewed to the right and with the mode located
Pappalardo et al. [34]
Hossmann et al. [32]

Rodríguez et al. [11]

at the interval 25–34.


Socievole et al. [13]
Arnaboldi et al. [9]

Petroczi et al. [20]


Huszti et al. [33]
Granovetter [27]

Xiang et al. [14]

4.2. Friendship questions


Shi et al. [25]
Lin et al. [26]
Lin et al. [31]
Friedkin [24]
Burt [30]

The second part of the questionnaire, repeated for each of the


Paper
Table 2

five randomly chosen FB friends, assesses the relationship between

4
Please refer to: http://sproutsocial.com/insights/new- social- media-
demographics/, last access on 2017/06/05 17:26:51
5
The questionnaire and the pseudonymized dataset can be found at http:
//portal.uc3m.es/portal/page/portal/ifibid/people/quijano/current_research/, last ac-
cess on 2017/06/05 17:26:51
6
See http://royal.pingdom.com/2012/08/21/report- social- network- demographics- in- 2012/,
last access on 2017/06/05 17:26:51.
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 63

Table 3
Personal Questions. The table presents the number of the question, the description (including the question type and the allowed re-
sponses), the name and the type of the associated variable.

# Text, Variable Type and Allowed Responses Variable Name Variable Type

(A) Write your name and surname. Open text - -


(B) Write your country of origin. Open text CountryRespondent Nominal
(C) Your sex Multiple choice: (1) W, (0) M. SexRespondent Nominal
(D) Write your age Open text, accepts values in [0, 100]. AgeRespondent Continuous [0,1]
(E) Write the number of FB friends you have (you can check it in your wall) Open text, FBFriends Continuous [0,1]
accepts values in [0, 500].
(F) Do you feel like a person that gives in in decisions (easily influenced by others Character Continuous [0,1]
recommendations) or you do not give in (you are pretty immovable and others
recommendations do not affect you) Scale from 0 (Easily influenced) to 10 (Very
Inmovable).
(G) Rate your FB usage Scale from 0 (I use it very little I rarely log in) to 10 (I use it a lot FBUse Continuous [0,1]
I log in everyday).

Table 4
Friendship Questions. The table presents the number of the question, the description (including the question type and the allowed responses), the name
and the type of the associated variable(s).

# Text, Variable Type and Allowed Responses Variable Name Variable Type

(Q1) Friend’s age (approx if you do not know it or check on their FB page for this info). AgeFriend Continuous [0,1]
Open text, accepts values in [0, 100]
(Q2) Friend’s sex Multiple choice: (1) W, (0) M. SexFriend Nominal
(Q3) Rate your closeness to this person. The strength of your relationship/tie Scale from TieStrength Continuous [0,1]
0 (Weak, not very strong) to 10 (Very strong).
(Q4) What is your relationship with this person? Choose the most relevant option Intimacy Nominal
Multiple choice:(6) Spouse or partner, (5) Relative, (4) Close friend, (3) Friend, (2)
Co-worker, (1) Acquaintance, (0) Unknown.
(Q5) How often do you communicate with this person on Facebook (e.g., by private FBIntensity Ordinal
message, by commenting on the wall, by commenting pictures, by chat). Select the
most restrictive option Multiple choice: (6) More than once a day, (5) On a daily
basis, (4) On a weekly basis, (3) On a monthly basis, (2) Every 3 months, (1) Less
often than every 3 months, (0) Never.
(Q6) How often do you see each other? Select the most restrictive option Multiple RealIntensity Ordinal
choice: (6) More than once a day, (5) On a daily basis, (4) On a weekly basis, (3) On a
monthly basis, (2) Every 3 months, (1) Less often than every 3 months, (0) Never.
(Q7) How many common friends you have in FB (you can check this by looking at CommonFriends Continuous [0,1]
his/her profile and in the “friends” window located in the left hand side) Open
text, accepts values in [0, 50 0 0].
(Q8) How long have you known each other (in years, approx) Open text, accepts values RealDuration Continuous [0,1]
in [0, 100].
(Q9) How long have you been FB friends (in years, approx) (you can check this by FBDuration Continuous [0,1]
looking at his/her profile and in the “information” window located at the left hand
side) Multiple choice: 0, ..., 13.
(Q10) Select approximatively the percentage of common tastes that you share Scale from CommonTastes Continuous [0,1]
0 (0%) to 10 (100%).
(Q11) Do you have contact with this person in other SN? Check all that apply Whatsapp, Twitter, Instagram, Nominal
Checkboxes: Whatsapp, Twitter, Instagram, Flickr, Google +, Others, None. Flickr, Google, Others
(Q12) When doing a joint activity, how much do this person’s wishes influence your Influence Continuous [0,1]
choices? Scale from 0 (very little) to 10 (a lot).
(Q13) How much would you trust a movie recommendation from her/him? Scale from 0 Movie Continuous [0,1]
(very little) to 10 (a lot).
(Q14) How much would you trust a restaurant recommendation from her/him? Scale Restaurant Continuous [0,1]
from 0 (very little) to 10 (a lot).
(Q15) How much would you trust a FB app recommendation from her/him? Scale from 0 App Continuous [0,1]
(very little) to 10 (a lot).
(Q16) How much would you trust a recommendation from her/him for migrating to SN Continuous [0,1]
another SN? Scale from 0 (very little) to 10 (a lot).
(Q17) How much money would you loan to this person if asked? Multiple choice: (0) Loaning Ordinal
Nothing, (1) 10 euros/pounds/dollars or less, (2) 100 euros/pounds/dollars or less, (3)
1000 euros/pounds/dollars or less, (4) 10,000 euros/pounds/dollars or less, (5) more
than 10,000 euros/pounds/dollars.

the respondent and the friend (for the questions, please refer to 4.2.2. Tie strength variable
Table 4). Answers are reflected in the following variables that can TieStrength (Q3), reflects the real tie strength between users. This
be roughly divided in 4 subgroups according to their function: is the dependent variable used in this paper’s prediction model
study. This variable’s distribution, shown in Fig. 1b, presents a
shape that resembles the one of the corresponding variable in
Gilbert and Karahalios [10] (“How strong is your relationship?”).
4.2.1. Friend description variables
AgeFriend, SexFriend (Q1 and Q2, respectively). These variables 4.2.3. Friendship descriptors variables
provide some personal demographic information on the friend Intimacy, FBIntensity, RealIntensity, CommonFriends, RealDura-
considered. tion, FBDuration, CommonTastes, Whatsapp, Twitter, Instagram, Flickr,
64 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

ally expressed [20,33], the frequency of appearance in photos


[8,10,11], distance between hometowns, or number of common
friends [10]. With the aim of being able to reuse the variable’s
definition in any SN theme or domain, it is necessary to re-
mit to the core of the definition of “personal relationship” (see
Section 2), and to study how the scale that this definition im-
plies can be estimated within any SN. That is, to estimate fac-
tors that predict the type of personal relationship. Hence, the
seven different types of personal relationships that exist (part-
ner, relative, close friend, friend, co-worker, acquaintance and un-
known) have been analyzed and in Q4 (variable Intimacy) re-
spondents have directly given this value. This is done to thor-
oughly study its relevance by having a good estimation of this
definition and because this information can generally be ex-
tracted from any SN: FB for example, already tags users with
their level of intimacy; Similarly, Enterprise SNs (ESN) also store
information regarding partnership, family or co-working bonds
for marketing purposes.
- Last, regarding intensity, researchers have associated this value
with different measures that encompass frequency of commu-
nication [8–10,33]. Hence, a reusable and domain independent
way to define it is in a general way as such. Thus, respondents
have provided both this measure’s real value (Q6, variable Real-
Intensity) and its FB estimation (Q5, variable FBIntensity).

With the variables mentioned so far Granovetter’s definition of


tie strength’s main pillars are covered and defined in a general-
ized way that allows a domain independent estimation and thus
its reusability. Besides, some FB estimations of these components
have been included for viablity purposes. Finally, to study the rele-
vance of other SNs respondents have been asked if besides of being
FB friends they are related in other SNs (Q11, variables Whatsapp,
Twitter, Instagram, Flickr, Google, Others).
Fig. 1. Dataset Histograms.
4.2.4. Recommendation susceptibility variables
Influence, Movie, Restaurant, App, SN, Loaning. (Q12–Q17, respec-
Google, Others (Q4–Q11, respectively). These variables assess the re-
tively). These variables are used to measure the practical impact
lationship between the participant and the friend. They are stud-
of the tie strength in terms of acceptance of a recommendation
ied as possible predictors of the tie strength. As mentioned before,
and can be used to understand the implications of the tie strength
one of this paper’s goals (see Section 3) is to analyze the four fea-
both inside and outside the context of measurement. These vari-
tures that Granovetter defined as pillars of tie strength computa-
ables measure whether the user can be influenced by friends’
tion (amount of time, intensity, intimacy and reciprocal services)
wishes/tastes in a decision-making process (Q12, variable Influ-
and if indeed the best way to compute the tie strength is a linear
ence), if she would follow their recommendation in movies, food,
combination of them or, on the other hand, other approaches and
FB applications or a SN (Q13–Q16, variables Movie, Restaurant, App,
features provide a better estimation:
SN) and, finally, the amount of money that she would loan them
- To measure the amount of time respondents have been asked to (Q17, variable Loaning).
directly answer how long they have known their friends (Q8, The next section studies different models that estimate the tie
variable RealDuration). Besides, to test whether it is possible to strength, their suitability and the strength of association between
estimate this variable through a representation in a given SN, the variables that conform it.
and to ascertain if it is a good enough estimation in the calcula-
tion of the tie strength, respondents have been asked to indicate 5. Analysis and results
how long they have known each other in FB (Q9).
- Regarding reciprocal services, related to FB, others have com- This section presents a quantitative analysis on the tie strength,
puted it as the number of applications that users have in com- the variables considered, and the estimation models. First, we ana-
mon [10] or as overlapping features in FB’s profiles [8]. Being lyze the correlation between the tie strength and the studied vari-
our goal to test and provide a reusable definition that can be ables and draw some interesting conclusions.
used to estimate this component in all sorts of SN, and follow-
ing state-of-the-art research that affirm that attribute similarity 5.1. Strength of association analysis
can improve link prediction models [28,38,39], Q10 and variable
CommonTastes reflect and capture this essence by measuring in The analysis of the strength of association between a predictor
general two actors percentage of common tastes. This definition and the tie strength provides a general idea of the level of linear
can be easily abstracted to other domains. relationship that two variables share, thus laying the ground for
- Regarding intimacy, its definition (see Section 2) suggests esti- more complex analyses. Note that for this evaluation a measure
mating it through variables that indicate the type of personal that can be computed for different types of variables (i.e., con-
relationship. Others have estimated this variable by measuring tinuous, nominal and ordinal) is needed. Thus, the methodology
several features like: the degree of intimacy in wording mutu- adopted runs as follows:
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 65

For each variable a simple linear regression model to predict


the tie strength using the incumbent variable as predictor is es-
timated. The model is multivariate for nominal and ordinal vari-
ables as they are replaced by the corresponding set of dummies.
Part of the output of the model is the coefficient of determina-
tion R2 , from which the correlation coefficient R can be obtained.
It is important to notice that this value is not the correlation be-
tween the tie strength and the incumbent variable. What it actu-
ally represents is the correlation between the observed tie strengths
and those predicted by the model. However, for continuous vari-
ables, this approach is similar to computing the Pearson correla-
tion. In fact, the coefficient R obtained by the simple linear regres-
sion model is equivalent to the absolute value of the Pearson cor-
relation between the considered variable and the tie strength. On
the other hand, for ordinal and nominal variables, this method is
equivalent to a one-way ANOVA, and the obtained R2 is equivalent
to η2 .
The coefficients R can be used to compare the strength of asso-
ciation between the predictors and the tie strength. Fig. 2 illustrates
the values of the coefficients R (sorted in non-increasing order) di-
vided in groups according to the scope of the variables.
From the figure it can be easily seen that the user variables are
not associated to the tie strength. This is an interesting result as
we would have expected that the (perceived) personality traits of
the participants (Character) should have affected the tie strength.
However, all the R coefficients take values close to zero. Also note
that, contrary to ones beliefs, we have observed that the use of
FB (FBUse) is not correlated with any of the FB related variables
(FBIntensity, FBDuration, FBFriends), nor is the number of FB friends
(FBFriends) with the participants average tie strength.
With respect to the friendship descriptors, the variable hav-
ing stronger association is Intimacy, followed by CommonTastes and
the Intensity variables (both the real and the FB surrogate). There-
fore, these variables are expected to be very significant in a lin-
ear model of tie strength. Interestingly, the fact that two individ-
uals communicate via Whatsapp is more (linearly) related to the
tie strength than the time that they have known each other (or the
FB surrogate). Regarding whether using specific network topology
measures is relevant to the computation of tie strength, the corre-
lations results illustrated in Fig. 2a and b show that neither the
nodes’ degree (FBFriends) nor the size of the shared community
(CommonFriends) are significant. Future research should investigate
more in detail the relevance of including further network topology
measures in tie strength computation.
With respect to the recommendation susceptibility of the par-
ticipants, according to Fig. 2c the tie strength is strongly associ-
ated to the general persuasiveness (Influence), the ability to borrow
money (Loaning), and successfully recommending movies (Movie).
This interesting result provides a valuable argument to a litera-
ture’s open question [8] about whether the impact of tie strength
decays or not outside of the context of measurement and is sup-
ported by Levin et al.’s [40] statement that tie strength and trust are
correlated. More support to this claim is also given by the associ-
ation of the variables Restaurant, App and SN, that show a moder-
ate linear relationship with the tie strength. Interestingly, SN has
the lowest correlation among all the recommendation variables,
despite belonging to the same context of the survey’s medium Fig. 2. Strength of association between the tie strength and the other variables.
(i.e., FB). Post-survey interviews have revealed that, in general,
participants are not interested in recommendations of SNs and
apps. strength on the power of recommendation and the areas where it
These preliminary results show the potential applications that is relevant the most.
an accurate evaluation of the tie strength has for decision-support When comparing these results to those in the literature [9,32],
systems such as recommender systems, businesses or companies. this paper’s research variables (see Fig. 2) seem to correlate bet-
In fact, it can be used to identify key nodes in a SN that can be ter to the tie strength than those proposed by Arnaboldi et al. [9].
exploited to broadcast in the most effective way. Future research However, it is not possible to directly associate these variables to
should investigate more in detail the practical impact of the tie theirs, as they are more generic and broader in scope. As an exam-
66 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

Table 5 Table 7
Eigenvalues, percentage of variance and cumulative percentage of Variables’ contributions in percentage in the hidden factors.
variance of the hidden factors.
Dim.1 Dim.2 Dim.3 Dim.4
Dim.1 Dim.2 Dim.3 Dim.4
AgeRespondent 10.2771 14.3600 0.1692 2.4075
Variance 3.8769 2.6875 1.9840 1.8157 FBFriends 4.0106 5.0090 1.3677 4.0850
% of var. 9.6923 6.7187 4.9601 4.5393 Character 0.2121 0.2392 1.7846 8.4971
Cumulative % of var. 9.6923 16.4109 21.3710 25.9103 FBUse 0.1041 1.9577 3.9752 6.6141
AgeFriend 10.2771 14.3600 0.1692 2.4075
CommonFriends 0.4951 9.8320 2.5898 0.5206
Table 6 RealDuration 10.7422 3.4616 0.0305 0.6083
Variables’ coordinates in the hidden factors. FBDuration 0.1209 0.0666 0.2130 0.5067
CommonTastes 9.7114 5.3657 0.3825 0.1181
Dim.1 Dim.2 Dim.3 Dim.4
CountryRespondent 2.1235 3.1703 5.6468 13.6603
AgeRespondent 0.3984 0.3859 0.0034 0.0437 SexRespondent 0.1795 0.2468 2.8448 4.6236
FBFriends 0.1555 0.1346 0.0271 0.0742 SexFriend 0.1506 0.0117 1.3336 0.8094
Character 0.0082 0.0064 0.0354 0.1543 Intimacy 15.1178 8.7517 21.9054 15.9041
FBUse 0.0040 0.0526 0.0789 0.1201 FBIntensity 14.7896 9.3052 22.8691 13.2172
AgeFriend 0.3984 0.3859 0.0034 0.0437 RealIntensity 12.2552 7.7369 28.5507 14.6200
CommonFriends 0.0192 0.2642 0.0514 0.0095 Whatsapp 7.3554 7.4295 2.5152 1.3193
RealDuration 0.4165 0.0930 0.0 0 06 0.0110 Twitter 0.3062 2.6383 0.4682 2.6255
FBDuration 0.0047 0.0018 0.0042 0.0092 Instagram 0.4476 5.5239 3.1220 1.3153
CommonTastes 0.3765 0.1442 0.0076 0.0021 Flickr 0.0048 0.1070 0.0062 0.0658
CountryRespondent 0.0823 0.0852 0.1120 0.2480 Google 0.0706 0.0568 0.0534 1.4359
SexRespondent 0.0070 0.0066 0.0564 0.0840 Others 1.2486 0.3700 0.0028 4.6391
SexFriend 0.0058 0.0 0 03 0.0265 0.0147
Intimacy 0.5861 0.2352 0.4346 0.2888
FBIntensity 0.5734 0.2501 0.4537 0.2400
RealIntensity 0.4751 0.2079 0.5665 0.2655 hidden factors, respectively. Higher values in Table 7 correspond to
Whatsapp 0.2852 0.1997 0.0499 0.0240 variables having a greater importance in the definition of the fac-
Twitter 0.0119 0.0709 0.0093 0.0477 tor. In summary, the first four hidden dimensions are characterized
Instagram 0.0174 0.1485 0.0619 0.0239
Flickr 0.0 0 02 0.0029 0.0 0 01 0.0012
by the following variables:
Google 0.0027 0.0015 0.0011 0.0261
• Factor 1: Intimacy, FBIntensity, RealIntensity, RealDuration, Age-
Others 0.0484 0.0099 0.0 0 01 0.0842
Friend, AgeRespondent, CommonTastes. This factor is concerned
with long, intense and intimate relationships between older
users that share many interests.
ple, RealIntensity includes the following set of variables introduced • Factor 2: AgeFriend, AgeRespondent. This factor regards mostly
by Arnaboldi et al. [9]: “Number of alters’ pictures in which ego
friendships between older users.
appears,” “Number of ego’s pictures in which alters appear” and • Factor 3: RealIntensity, FBIntensity, Intimacy. The focus of this
“Number of events in common”.
factor is on very intense and intimate relationships.
Regarding the research by Hossmann et al. [32], their results are • Factor 4: Intimacy, RealIntensity, CountryRespondent, FBInten-
not directly comparable due to the differences in experiment and
sity. This factor concerns intimate and intense relationships and
variable design. However, similar conclusions are reached: Hoss-
discriminates on the country of origin of the respondent.
mann et al. [32] detect that tie strength is correlated to commu-
nication and physical closeness, which relates to the high correla- The factors identified by the FAMD present several overlaps.
tion between FBIntensity and RealIntensity with tie strength shown For instance, FBIntensity and RealIntensity have a strong effect on
in this analysis. factors 1, 2, and 3. Given the lack of independence between the
hidden factors, we can conclude that they cannot explain the tie
5.2. Factor analysis strength dimensions hypothesized by Granovetter. However, this
does not mean that the factor description given by Granovetter
To analyze the nature of the variables in the dataset, a factor may not be accurate for tie strength estimation (this is the ob-
analysis has been carried out on the data. The objective of this ject of the analysis carried out in Section 5.3). What it does say
analysis is to ascertain whether or not the factors that compose is that Granovetter’s factors cannot fully characterize a friendship
the variables are able to explain the tie strength dimensions hy- relationship. In fact, the objective of Factor Analysis is to describe
pothesized by Granovetter. Due to the presence of both qualitative variability in the data in terms of unobserved variables. Thus, the
and quantitative variables, Factor Analysis of Mixed Data (FAMD) characteristics of a friendship are determined by more elements
[41] has been applied. In fact, traditional Factor Analysis (or Prin- than intimacy, intensity, common tastes, and duration. However,
cipal Component Analysis) could not be applied as it requires all the first factor identified by the analysis (i.e., the factor having the
the variables to be quantitative. Similarly, Multiple Correspondence highest explained variability) includes exactly Granovetter’s factors,
Analysis is used for qualitative variables. On the other hand, FAMD plus both the respondent and the friend’s ages. Therefore, this sug-
can be applied when quantitative and qualitative variables are si- gests that the characteristics of a friendship are, indeed, principally
multaneously present as active elements. For the sake of this anal- determined by the tie strength (as defined by Granovetter) and the
ysis, only the predictive variables (user and friendship) have been age of the persons involved. This is a proof of the primary role of
considered. tie strength in the definition of friendships and relationships be-
The first four factors (or dimensions) identified by the method tween people.
are studied. Table 5 illustrates the variance associated to each di-
mension, as well as the percentage of variance and the cumulative 5.3. Linear model
percentage of variance. The total variance explained by the four
factors is very low, 25% approx., suggesting that they are not rep- As mentioned before, Granovetter [27] proposed that the tie
resentative of the whole dataset. Tables 6 and 7 show the coor- strength is a linear combination of four factors: the amount of time,
dinates and the contribution in percentage of the variables in the the emotional intensity, the intimacy (mutual confiding) and the
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 67

reciprocal services. In this section we assess this statement, pro-


vide an empirical evaluation of the goodness of the linear model
described by Granovetter, analyze the importance of each variable
and, finally, measure the impact of disregarding or substituting
some of the variables with surrogates extrapolated from a SN (in
our case, FB).
In mathematical terms, a linear combination of variables can be

Linear models’ beta coefficients. The symbols between the brackets next to the beta values represent the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.0 01; (∗ ∗ ) 0.0 01

(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )
expressed as:

(∗ )
()
()

()

p

Coefficient Value
Y = β0 + βi Xi (1)

−0.0076339551

−0.0475174238
0.0586062806
0.0336837256

0.2514905068

0.1053849084
0.1352654431
0.4801696186
0.4684718135

0.4121393382
0.1893072821
0.5712157603
0.0911373613

0.1113889351

0.165236017
i=1

where Y is the dependent variable, Xi are the independent vari-


ables, β 0 is the intercept and
 β i are coefficients
 associated to the
variables. Given a sample Y, X1 , . . . , X p , the best beta coefficients
can be obtained by means of linear regression. It is important to

CommonTastes
FBIntensity1
FBIntensity2
FBIntensity3
FBIntensity4
FBIntensity5
FBIntensity6
remark that, as in the case of the strength of association analysis

FBDuration
(Intercept)

Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Intimacy1
(Section 5.1), the nominal and the ordinal variables are replaced by

Model 4

Variable
the corresponding set of dummies. The following models are con-
sidered and analyzed:

1. X=(RealDuration, RealIntensity, Intimacy, CommonTastes).

(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )
(∗ ∗ )
(∗ )
X=(FBDuration, RealIntensity, Intimacy, CommonTastes).

()
()

()
2.
3. X=(RealDuration, FBIntensity, Intimacy, CommonTastes).

Coefficient Value

−0.0236147212
X=(FBDuration, FBIntensity, Intimacy, CommonTastes).

0.4531563843

0.0577809773
0.5621640378

0.1534975292
0.0829754184

0.4409749215

0.1098295621
0.1100556056

0.1782106087
4.

0.4161323728
0.0241130337

0.1295191388
0.237625435

0.114200989
In Model 1 the known variables provided by the respondent
are used as proxies of the factors identified by Granovetter. In the
other models, some of these variables are substituted by their FB
proxy. This allows us to assess the impact of surrogating the vari-

CommonTastes
ables with measurements drawn from a given SN in general and

RealDuration
FBIntensity1
FBIntensity2
FBIntensity3
FBIntensity4
FBIntensity5
FBIntensity6
(Intercept)

Intimacy3
Intimacy4
Intimacy5
Intimacy6
Intimacy1
Intimacy2
from FB in particular.
Model 3

Variable
Table 8 presents the linear models’ beta coefficients and
their significance represented by the number of asterisks next to
each beta coefficient value, as explained in the table’s legend.

(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )
Intimacy1–Intimacy6 and Intensity1–Intensity6 refer to the dummy
()
()
()

()
variables representing the level specified by the number in the
Coefficient Value

name. The performance of the models throughout the paper is


−0.0050998652
−0.0429078035

0.0532583283
0.2609923324
0.3809810828

0.1840 0 03987

0.2543776789
0.0420821607

0.3880057217

0.4186597779
0.1168586366
0.1847872521

studied by means of three measures: the training RMSE (Root

0.286301221
0.416785887

0.165428154
Mean Square Error), computed on a model estimated using all the
observations, the validation RMSE, computed by applying leave-
one-out crossvalidation on the observations, and the R2 . Table 9
shows these values for the models presented so far. From the ta-

CommonTastes
bles a number of considerations can be drawn:
≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.

RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
FBDuration
(Intercept)
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6

- Regarding Model 1, all the variables identified by Granovetter


Model 2

Variable

are indeed significant to predict the tie strength. The training


and the validation error are similar, 0.14 approx, meaning that
on a scale from zero to 10 predicted tie strengths are one point
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )

and a half away from the real value, on average. By observing


(∗ )
()
()
()

the coefficient values it can be seen that the variable having


Coefficient Value

the greatest impact on the tie strength is CommonTastes (0.42


−0.0102971916
−0.0341197319

0.0424881889

0.3662260574

0.2862918495
0.4091906799

0.4217724309
0.1748226203
0.1797162256

0.1618258431
0.2514224171
0.1125199331
0.355364953

0.244148771

approx), followed by Intimacy6 (0.41 approx.), representing an


0.17500678

intimate sentimental relationship. Apparently the only variable


that penalizes the tie strength is Intimacy1 that identifies rela-
tionships between acquaintances. This is an interesting result
as we would have expected the lowest level of tie strength to
CommonTastes
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6

be between strangers. However, results suggests that, in terms


RealDuration
(Intercept)

of tie strength, it is better to not know a person rather than be-


Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Model 1

Variable

ing just an acquaintance.


Table 8

- For all the linear models, the training and the validation RMSEs
are very similar suggesting that there is no overfitting in the
models.
- The surrogate FB variables provide a good approximation to the
real variables. In fact, in Model 4 the difference in terms of
training and validation RMSE is smaller than 0.01 (a 6.73% vali-
dation RMSE gap approx.). This result suggests that SNs can in-
deed provide information on the users that can then be used as
68 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

Table 9
Linear models’ training RMSE, validation RMSE, and R2 .

Model Training RMSE Validation RMSE R2

Model 1 0.1417802897 0.1454339072 0.8041160 0 08


Model 2 0.1423034361 0.1459978679 0.8026677733
Model 3 0.1510828348 0.1552700608 0.777567884
Model 4 0.1511535663 0.1552256484 0.777359566

Table 10
95% confidence intervals for the beta coefficients of
Model 1.

Variable 2.5 % 97.5 %

(Intercept) −0.0982142886 0.0299748247


Intimacy1 −0.078543014 0.0579486307
Intimacy2 −0.0332032187 0.1181795966
Intimacy3 0.1052008861 0.2542315651
Intimacy4 0.277665226 0.4547868888
Intimacy5 0.2675193158 0.4432105902
Intimacy6 0.2814507797 0.5369305801
RealIntensity1 0.079812281 0.1452275852
RealIntensity2 0.1262224958 0.2234227447
RealIntensity3 0.1109521513 0.212699535
RealIntensity4 0.1977760062 0.305068828
RealIntensity5 0.1654824886 0.3228150534
RealIntensity6 0.1772514065 0.3953322926
RealDuration 0.0345522307 0.3154613292
CommonTastes 0.3640122038 0.479532658

proxy for the real values. This agrees with the results presented
by Socievole et al. [13].
- In general, the duration variables are the less significant.7
This agrees with the results of the association analysis (see
Section 5.1).
- However, if tie strength had to be linearly predicted with just Fig. 3. 95% confidence intervals for the variables Intimacy and RealIntensity in
one variable then, the most associated one should be chosen, Model 1. In the figures, the solid dots represent the coefficient value while the
which in our particular case (see Section 5.1) would be Intimacy empty circles define the confidence interval limits.
followed by CommonTastes.
Table 11
Reduced models’ training RMSE, validation RMSE, and R2 .
5.3.1. Beta coefficient’s confidence intervals
Model Training RMSE Validation RMSE R2
Some interesting insights can be drawn by studying the confi-
dence intervals associated to the beta coefficients in Model 1 (see Model R1 0.1425947974 0.14601552 0.8018588844
Model R2 0.157154153 0.1598006565 0.7593316572
Table 10). Fig. 3 provides a graphical representation. By observ-
Model R3 0.1751020714 0.1782862266 0.7012211783
ing the table and the figures, it can be seen that the levels of the Model R4 0.1675015779 0.1712608702 0.7265958844
variable Intimacy could be aggregated in two independent groups,
namely {(6) Spouse or partner, (5) Relative, (4) Close friend} de-
noting a strong relationship, and {(3) Friend, (2) Co-worker, (1) Ac-
R2. X=(RealDuration, Intimacy, CommonTastes).
quaintance, (0) Unknown} that represent relationships that have a
R3. X=(RealDuration, RealIntensity, CommonTastes).
low level of tie strength. As a consequence, if too costly to obtain,
R4. X=(RealDuration, RealIntensity, Intimacy).
researchers could narrow down the Intimacy variable to two di-
mensions, i.e. high and low. On the other hand, it is not possible Each reduced model presents only three of the four variables.
to clearly cluster the levels of the variable RealIntensity in different The results of the experiments are illustrated in Tables 11 and
groups as their confidence intervals overlap. 12. According to Table 11, the performance of the linear model
decreases the most when Intimacy and CommonTastes are not in-
5.3.2. Testing for variable elimination cluded in the linear model (gap of 20% approx.). In fact, the valida-
The goal of the next study is to understand the impact of disre- tion RMSEs jump to 0.18 and 0.17, respectively, corresponding to a
garding a variable when computing the tie strength. Since obtaining validation RMSE gap of 22.6% and 17.8%, respectively. On the other
the variables’ (or the proxies’) value could be costly (or even im- hand, disregarding RealDuration does not have a major impact on
possible), this analysis helps a decision-maker to choose how to the performance of the linear model (validation RMSE gap of 0.4%
invest limited resources in data collection operations. The follow- approx.), suggesting that this variable could be removed if too ex-
ing reduced models are analyzed: pensive to obtain. These results agree with the insights drawn in
the Strength of Association Analysis (see Section 5.1).
R1. X=(RealIntensity, Intimacy, CommonTastes).
5.3.3. Variable selection
7
Model Complete, includes all the variables obtained from the
In preliminary experiments where the duration variables have been represented
as the ratios to the age of the respondent have shown that these derived variables
survey. Table 13 illustrates the beta coefficients and the relative
have a lower strength of association with the TieStrength than the originals. These significance of the variables. According to these results the vari-
results have not been reported for the sake of brevity. ables can be classified in order of importance. The most relevant
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 69

Table 13
Beta coefficients of the linear model including all the variables obtained from
the survey. The symbols between the brackets next to the beta values represent
the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.001; (∗ ∗ )
0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.

Variable Coefficient Value

(Intercept) −0.0502265304 ()
Reduced linear models’ beta coefficients. The symbols between the brackets next to the beta values represent the significativeness of the variable in the model. Legend: (∗ ∗ ∗ ) p < 0.001;

CountryRespondentDominican Republic 0.0493120983 ()

(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
CountryRespondentItaly 0.0194899591 ()

(.)
()
() CountryRespondentSouth Korea 0.1218674273 ()
Coefficient Value

CountryRespondentSpain 0.0042439897 ()

0.4006302956
0.2358069934
0.0663746706

0.2302142843

0.3142388875

0.1500463358
0.5632358151

0.1524363787
0.0252177245

0.1575637923
0.3136319966

0.5113639851
0.590353641

0.300870918
CountryRespondentUK −0.0065363919 ()
SexRespondent1 0.0095698912 ()
AgeRespondent −0.1815949016 (.)
FBFriends −0.1120226078 ()
Character 0.0648879392 (∗ )
FBUse 0.0562825939 (∗ )
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
AgeFriend 8.31657995645122E−005 ()

RealDuration
(Intercept)

SexFriend1 0.0013402734 ()
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Model R4

Variable

Intimacy1 −0.0274997999 ()
Intimacy2 0.0050127256 ()
Intimacy3 0.1343400635 (∗ ∗ ∗ )
Intimacy4 0.2973986037 (∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

Intimacy5 0.3072834803
Intimacy6 0.3232628227 (∗ ∗ ∗ )
FBIntensity1 0.0159442477 ()
Coefficient Value

−0.0946689079

FBIntensity2 0.0309682562 ()
0.5287046923
0.6248044372
0.5658882361
0.2843779729
0.2887962137

0.3735913069
0.1638867328

0.3916713081

FBIntensity3 0.0212767968 ()
FBIntensity4 0.0418982028 ()
FBIntensity5 0.0454347325 ()
FBIntensity6 0.0 0 08603143 ()
RealIntensity1 0.090100251 (∗ ∗ ∗ )
RealIntensity2 0.1412422451 (∗ ∗ ∗ )
CommonTastes

(∗ ∗ ∗ )
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6
RealIntensity1

RealIntensity3 0.1324917828
RealDuration

RealIntensity4 0.2027933576 (∗ ∗ ∗ )
(Intercept)
Model R3

RealIntensity5 0.2232402106 (∗ ∗ ∗ )
Variable

RealIntensity6 0.3070519108 (∗ ∗ ∗ )
CommonFriends 0.1521862564 ()
RealDuration 0.2622468711 (∗ ∗ )
FBDuration 0.0150863664 ()
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

(∗ ∗ ∗ )

(∗ ∗ ∗ )
(∗ )

(∗ )

CommonTastes 0.357199348
()
()

Whatsapp 0.09002879 (∗ ∗ ∗ )
Coefficient Value

Twitter 0.058356333 (.)


−0.0232174638

0.4758944403
0.0998526739
0.2610037082
0.0168288373

0.4744749963

0.1762884285
0.4897159218

Instagram −0.0209596998 ()
0.616653027

Flickr −0.1510158076 ()
Google 0.0153396018 ()
(∗ ∗ ) 0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < 0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.

Others 0.0768023769 (∗ ∗ )
CommonTastes
RealDuration

group is comprised of RealIntensity, RealDuration, CommonTastes, In-


(Intercept)
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Model R2

Variable

timacy, and Whatsapp. The four variables identified by Granovetter


are indeed among the most significant. As already stated in the
Correlational Analysis, also the variable Whatsapp is very signifi-
cant. Other variables showing a certain relevance are Other, Charac-
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )
(∗ ∗ ∗ )

ter and FBuse, in order of importance. Although the variable Char-


()
()
()

acter could be expected to have an impact on the tie strength, the


Coefficient Value

−0.0 0 08167792
−0.0287693977

variable Other (i.e., a dummy variable that indicates whether the


0.2436864568
0.2836309662
0.4254509224

0.4200209107
0.0487261272
0.1928929781

0.1782487253
0.3945361107

0.2511310036
0.1591292133
0.395644463

0.113062614

participant and the friend also use social networks other than the
most popular to stay in touch) comes initially as a surprise. How-
ever, thanks to this result an interesting conclusion can be drawn:
if two people use an uncommon social network to communicate,
they probably share a special bond. This conclusion requires fur-
CommonTastes
RealIntensity1
RealIntensity2
RealIntensity3
RealIntensity4
RealIntensity5
RealIntensity6

ther investigation in the future as it could identify important niche


(Intercept)
Intimacy1
Intimacy2
Intimacy3
Intimacy4
Intimacy5
Intimacy6
Model R1

markets. In terms of error, the Training RMSE and Validation RMSE


Variable
Table 12

of Complete are 0.1320208007 and 0.1428565362, respectively, and


R2 = 0.8305328856. As expected, the complete model has better
performance than Model 1, however, the improvement in terms of
validation RMSE is only of 1.77% approx.
To identify the best subset of variables a stepwise regression is
run. Both the forward and the backward procedures provided the
same model, Stepwise, illustrated in Table 14. Indeed, the variables
proposed by Granovetter are included in this model that represents
the best selection of variables for the tie strength among those
70 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

Table 14 Table 15
Beta coefficients of the stepwise-selected linear Non linear models’ training RMSE and validation RMSE.
model. The symbols between the brackets next
to the beta values represent the significative- (a) Models including Granovetter’s factors. The non linear models are compared
ness of the variable in the model. Legend: (∗ ∗ ∗ ) to Model 1. The best RMSE are highlighted in bold.
p < 0.001; (∗ ∗ ) 0.001 ≤ p < 0.01; (∗ ) 0.01 ≤ p < Model Training RMSE Validation RMSE
0.05; (.) 0.05 ≤ p < 0.1; () p ≥ 0.1.
GAMS 1 0.141780 0.145434
Variable Coefficient Value RF 1 0.1197373 0.1500345
SVM 1 0.133752 0.1497631
(Intercept) 0.068505009 (.)
Model 1 0.141780 0.145434
Character 0.051745604 (.)
CommonTastes 0.360727604 (∗ ∗ ∗ ) (b) Models including all the variables obtained from the survey. The non linear
FBUse 0.068407339 (∗ ∗ ∗ ) models are compared to Model Complete and Model Stepwise. The best RMSE
Intimacy1 −0.024458423 () are highlighted in bold.
Intimacy2 0.017839837 () Model Training RMSE Validation RMSE
Intimacy3 0.14534474 (∗ ∗ ∗ ) GAMS Complete 0.1319776 0.1404617
Intimacy4 0.310844222 (∗ ∗ ∗ ) RF Complete 0.0650688 0.1409619
Intimacy5 0.322514495 (∗ ∗ ∗ ) SVM Complete 0.1087886 0.1576382
Intimacy6 0.351332087 (∗ ∗ ∗ ) Model Complete 0.132021 0.142857
Others 0.074617638 (∗ ∗ ∗ ) Model Stepwise 0.1337331 0.138569
RealDuration 0.135811941 (∗ ∗ )
RealIntensity1 0.223615582 (∗ ∗ ∗ )
RealIntensity2 −0.019782306
RealIntensity3 0.0392605 In this analysis, the three non linear models considered are
RealIntensity4 −0.018611744
tested on two different configurations: (I) the factors defined by
RealIntensity5 0.0 01670 093
RealIntensity6 0.022639923 Granovetter, corresponding to Model 1; (II) all the variables ob-
Twitter 0.056447775 (.) tained from the survey, corresponding to Models Complete and
Whatsapp 0.089545628 (∗ ∗ ∗ ) Stepwise. Then, their results are compared to those of the corre-
sponding linear models, as shown in Table 15.
In the table, the best RMSE are highlighted in bold and, as per
considered in this research. In terms of error, the Training RMSE the other analysis in this research, the Validation RMSEs are cal-
and Validation RMSE of Stepwise are 0.1337331 and 0.138569, re- culated by Leave-One-Out Cross-Validation. It can be observed that
spectively, and R2 = 0.8257. Compared to Model Complete, Stepwise RF provides the best Training RMSE. However, the best Validation
has a higher Training RMSE but a lower Validation RMSE, although RMSE are obtained by linear models, namely Model 1 for the con-
only by a small fraction. figuration with the four Granovetter factors and Model Stepwise for
the configuration with all the variables. These results show that,
5.3.4. Non linear model considerations for the data considered, allowing local nonlinearity in the predic-
We now explore the effect of including nonlinearities in the tors or considering nonlinear prediction functions does not result
definition of the tie strength. In particular, we analyze the effect of in an improvement of the estimation, supporting the proposal of
(a) considering local nonlinearities in the predictors and (b) con- Granovetter that tie strength can be indeed explained as a linear
sidering models other than regression that are non-linear in na- combination of factors.
ture. These objectives are achieved by (a) studying Generalized Ad- Finally, Fig. 4 illustrates the smooth functions identified by
ditive Models (GAMs), and (b) Random Forsets (RFs) and Support the model GAMS Complete. The variables discarded by the model
Vector Machines (SVMs). are those that present a constant smooth function taking value
GAMs are generalized linear models in which the predicted zero: AgeRespondent, FBFriends, AgeFriend, FBIntensity, Common-
variable depends linearly on unknown smooth functions of some Friends, and FBDuration. Also, by observing the plots, it can be
factors, and thus are capable of identifying local non-linearities at seen that only one variable has a very limited non-linear rela-
factor level. Their generalized formulation is: tionship with the tie strength: FBUse. According to the graph, this
variable reaches a maximum in 0.6, approximatively, meaning that
g(E(Y )) = β0 + f1 (x1 ) + f2 (x2 ) + · · · + fm (xm ) (2)
people that make a moderate use of FB tend to have stronger ties
In the GAMs considered in this study, g() is the identity func- with their FB friends compared to people that use FB a lot or very
tion, and fi () are thin plate spline smooth functions for continu- little.
ous and ordinal factors and identity functions for the nominal vari-
ables. Feature selection in GAMs can be carried out by implement- 5.4. Comparison with other literature’s models
ing a smooth modification technique that penalizes smooth func-
tions having no wiggliness, thus effectively dropping the factors To understand this research’s usability, capabilities and limita-
that have no effect on the predicted variable. tions, the efficiency of several tie strength models proposed in the
RFs are an ensemble learning method that operate by con- literature is now tested.
structing a multitude of decision trees at training time and out- It has been suggested that the strength of a tie can be approx-
putting the mode (classification) or mean prediction (regression) imated by considering the intensity of the relationship between
of the individual trees. Decision trees partition the factor space ac- two individuals (features that represent recency of communica-
cording to value tests, therefore resulting in a non-linear classifica- tion [26] or the interaction frequency [10,27]). According to cor-
tion. Another advantage of using decision trees is that they auto- relation results shown in Fig. 2, the variable RealIntensity presents
matically perform feature selection. a strength of association with TieStrength equal to 0.64565, sug-
An SVM model represents observations as points in space and gesting that there exits a moderate linear relationship between
finds the hyperplane that separates them in the best possible way. the variables. Also, from Fig. 2 it can be easily seen that vari-
SVMs can efficiently perform a non-linear classification using what ables Intimacy and CommonTastes have a higher association. A sim-
is called the kernel trick, implicitly mapping the inputs into high- ple linear regression model built using RealIntensity and TieStrength
dimensional feature spaces. In this work, SVMs with radial basis as the independent and the dependent variables, respectively, re-
kernel are considered. sults in: Training RMSE = 0.2301, Validation RMSE = 0.23323, and
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 71

0.2

0.2

0.2

0.2
s(AgeRespondent,0)

s(Character,0.75)
s(FBFriends,0)

s(FBUse,2.58)
0.1

0.1

0.1

0.1
0.0

0.0

0.0

0.0
−0.2 −0.1

−0.2 −0.1

−0.2 −0.1

−0.2 −0.1
0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0

AgeRespondent FBFriends Character FBUse


0.2

0.2

0.2

0.2
s(CommonFriends,0)
s(RealIntensity,1.01)
s(FBIntensity,0.26)
s(AgeFriend,0)

0.1

0.1

0.1

0.1
0.0

0.0

0.0

0.0
−0.2 −0.1

−0.2 −0.1

−0.2 −0.1

−0.2 −0.1
0.004 0.008 0.012 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0

AgeFriend FBIntensity RealIntensity CommonFriends


0.2

0.2

0.2
s(CommonTastes,1.49)
s(RealDuration,0.94)

s(FBDuration,0)
0.1

0.1

0.1
0.0

0.0

0.0
−0.2 −0.1

−0.2 −0.1

−0.2 −0.1

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

RealDuration FBDuration CommonTastes

Fig. 4. Smooth functions identified by model GAMS Complete. The value in the y-axis of each plot identifies the effect of the variable on the tie strength value.

R2 =0.478. These results prove that relying exclusively on the in- Friends is greater than or equal to one, and zero otherwise it can
tensity is not enough to effectively estimate the tie strength, as it be observed that with respect to our dataset there are 19 zeros and
results on average in a significant error. In fact, on a scale from 516 ones, and the association with TieStrength is only 0.04266. It is
zero to 10, the tie strength misestimation is expected to be of already clear that the variable OneFriend is unfitted to discriminate
more than 2 levels, on average. This goes along with other authors’ different levels of tie strength. The performance of a simple lin-
opinions [35], that claim that certain indicators such as frequency ear regression model built using OneFriend and TieStrength as the
of contact or duration are unnecessary in the tie strength analysis independent and the dependent variables, respectively, is: Train-
[35]. As for example, it can be misleading to qualify a relationship ing RMSE = 0.31975, Validation RMSE = 0.3213, and R2 =0.00182.
as “strong” if the high frequency of contact refers to neighbors or These results confirm our preliminary evaluation on the effective-
colleagues. ness of the variable built as suggested by Shi et al.
According to Marsden and Campbell [35], closeness is the only Other approaches, such as those proposed by Gilbert and Kara-
indicator which can determine the strength of relationship because halios [10], Arnaboldi et al. [9], Quijano-Sánchez et al. [8] and Ro-
it is independent of the predictors. The results presented in this re- dríguez et al. [11], use so many FB and/or Twitter features that
search disagree with Marsden and Campbell [35] statement as they their models are impossible to reproduce due to the current APIs
show that in the tie strength computation, including duration in- restrictions or reuse in other application contexts. For the same
tensity and more importantly common tastes predictors does add reason, the features proposed by Pappalardo et al. [34] (i.e., cardi-
some value. Besides, as we can see, building a simple linear regres- nality of the actors’ neighborhoods and dimension relevance [36])
sion model using Intimacy and TieStrength as the independent and could not be obtained in the context of this study. Therefore, it has
the dependent variables, respectively, results in: Training RMSE = not been possible to test their proposal against this paper’s, hence,
0.1904, Validation RMSE = 0.19257, and R2 = 0.643. These results pointing out reproducibility issues.
show that relying exclusively on the intimacy is not enough to es- As an additional note, the models presented by Arnaboldi et al.
timate the tie strength, as it leads to a significant error (two levels [9] display a larger error than those proposed in this research. Al-
on average, on a zero to ten scale). though the variables adopted in their work are not directly associ-
On the other hand, Shi et al. [25] suggest that the existence ated to the ones presented in this paper, there is a lack of variables
of a common friend between two individuals indicates the pres- representing the “intimacy” between the subjects, which as shown
ence of a strong tie between two individuals. After building the in Section 5.1 is this research’s our most significant variable while
dummy variable OneFriend, that takes value one when Common- theirs is recency of communication in FB. Similarly, in Huszti et al.
72 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

[33], Petroczi et al. [20] and Lin et al. [31] proposals the features in the time span considered. Note that, for a given r ∈ R, ni, j, r is
used in the tie strength computation are so specific of a concrete not necessarily symmetrical, e.g., i makes monthly transfers to j to
SN that their approach is not generalizable. pay the rent but the opposite is not true. An exception to this rule
This comparison study lets us conclude that the models pre- are the relationships identified as symmetrical in Table 16: in this
sented in this paper, indeed allow to estimate tie strength more ac- case, ni, j,r = n j,i,r , where r is a symmetrical relationship.
curately and more generally than those proposed so far in the lit- To address this problem, the tie strength is estimated as a linear
erature, mainly because the predictors we identify for each of the combination of three of the four components suggested: intimacy,
four tie strength components present the following characteristics: intensity and common tastes. Unfortunately, the time component
had to be disregarded since no information was given by the fi-
• They are independent of the SN domain.
nancial institution regarding the length of the relationship between
• They can be successfully represented with proxies that are easy
clients. As a consequence, Model R1 (see Section 5.3.2) can be ap-
to obtain.
plied by identifying valid proxies for its factors within the ESN. The
• They provide a lower tie strength estimation error.
proxies used in this case study are presented in the following.
Next, the presented insights and models are applied to a con-
crete tie strength implementation in a financial network. 5.5.1. Intimacy
In Model R1, variable Intimacy is defined as nominal and having
5.5. Case study: tie strength in a financial network seven levels, from zero to six. The degree of intimacy between i
and j depends on the type of relationship two actors share and the
An example of the application possibilities of this research is level of intimacy they reflect. A mapping of the different types of
now presented through a captation problem, where the goal of a relationships to corresponding levels of intimacy has been defined
first-rate Spanish financial institution is to acquire new clients and in joint collaboration with experts of the financial institution and
to promote a product between them. To solve this problem, the is illustrated in Table 16:
objective of identifying and quantifying tie strength relationships
between all actors in an Enterprise Social Network (ESN) [42] is INT (r ) → [0, 1] (3)
crucial as it helps to target marketing campaigns according to the where r ∈ R is a type of relationship. The intimacy between i and
idea that trusting relationships lead to greater knowledge exchange j is given by the most intimate relationship type that the actors
[40]. Thus, the immediate goal is to analyze tie strength relation- share:
ships in the ESN and recommend, among others, the sequence of
clients that should be contacted to successfully acquire a target of Intimacyi, j = max {INT (r )} (4)
r∈R:ni, j,r >0∨n j,i,r >0
interest [43].
However, in order to compute the tie strength between two
5.5.2. Intensity
nodes in this ESN a new problem is faced where, differently from
In Model R1, variable Intensity is defined as ordinal and having
previous studies [9,10,33] and mainly due to the network’s domain
seven levels, from zero to six. The frequency and volume of rela-
and size, it is impossible to obtain direct feedback from the net-
tionships between two users, ni, j, r , represents a valid proxy for the
work’s actors. That is, it is not possible to obtain real tie strength
intensity. Hence, the level of intensity between i and j is obtained
measures to compare and evaluate a designed model, nor can be
by rescaling the highest ni, j, r that two users share:
obtained more SN information through extra questionnaires than
 
those already in possession of the institution. Intensityi, j = round(6 · max ni, j,r , n j,i,r ) (5)
These limitations can be addressed applying the methodology r∈R

and the insights so far presented. More in detail, in the following


5.5.3. CommonTastes
we present the features from the ESN that can be used to represent
In Model R1, variable CommonTastes is used as a proxy of recip-
the tie strength components presented in this paper, and how to
rocal services between the actors. This variable is continuous and
combine them by using the coefficients identified in Section 5.3. To
ranging between zero and one. In this domain’s context, reciprocal
verify the validity of the obtained estimation, two experts from the
services indicate that relationships exist going from i to j as well
institution having similar levels of experience and seniority have
as from j to i:
provided tie strength evaluations for 193 pairs of clients. The eval-  

uations given by the experts (that show a Spearman correlation
⎪1if ∃r ∈ R : ni, j,r > 0

⎪  
of 0.8206, indicating very similar rankings) have been averaged to ⎨ ∧ ∃r ∈ R : n j,i,r > 0
obtain a single measure of tie strength.8  
CommonT astesi, j = 0.5 if ∃r ∈ R : ni, j,r > 0 (6)
The ESN provided by the financial institution is comprised of ⎪
⎪  

⎩  ∃r ∈ R : n j,i,r > 0
bank clients operations, where nodes represent users and links be-
tween nodes represent money transactions or other types of re- 0 otherwise
lationships performed by the institution’s clients (i.e., at least one where  is the exclusive disjunction operator.
of the nodes is a client of the financial institution). Each node is
characterized by personal data and each edge contains information 5.5.4. Results
about: the source i, the destination j, the type of relationship it Table 17 shows the RMSE and the Mean Absolute Error (MAE)
represents r, and a factor ni, j, r ∈ [0, 1]. The different existing types for the tie strength estimations obtained by using the coefficients
of relationships are defined in Table 16. Factor ni, j, r represents the of Model R1 and by a linear model trained on the whole dataset.
intensity of each relationship type r ∈ R originated from client i As explained, the coefficients of Model R1 are used where no ex-
and directed to j. This data, given by the financial institution, is in- plicit evaluation of tie strength could be obtained. In the dataset
tended as a combined measure of the frequency and volume of a considered, this would result in a MAE of 0.19 approx., meaning
relationship between users in 2016. An intensity equal to zero in- that, on a scale from 0 to 10, an average error of 2 levels could
dicates that the users have never shared a relationship of type r be expected. On the other hand, in case explicit evaluations of tie
strength are available, the MAE obtained by a linear model would
8
The pseudonymized dataset can be found at http://portal.uc3m.es/portal/page/ be of 0.14, approx. Both errors are very reasonable, considering the
portal/ifibid/people/quijano/current_research/, last access on 2017/06/05 17:26:51 intrinsic subjectivities of the tie strength evaluations. These results
F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74 73

Table 16
Relationship types. For each type, the table shows a brief description, the corresponding level of
intimacy, and the reciprocity.

Relationship Type Definition INT Symmetrical

Rent i pays the rent to j. 1 No


Friendly i and j are friends. 4 Yes
Proxy i is a proxy representative of j. 5 No
Reverse factoring i pays j through reverse factoring. 4 No
Proposal i proposes j as the ordering party of a reverse 4 No
factoring which is not approved by the bank yet.
Co-owner i and j co-own an account. 6 Yes
Discount i pays j through a discount service. 3 No
Related i and j are related. 5 Yes
Subsidiary i is a company controlled by j. 4 Yes
Payer i pays the pension or the salary to j. 2 No
Card i has a corporative card of j. 5 Yes
Transfer i performs a transfer to j. 3 No

Table 17 would certainly represent an important source of information for


Error measures for tie strength estimations on
decision makers.
the case study data obtained using the coef-
ficients obtained from Model R1 and estimat- We hope that this work will be a useful source of ideas for
ing a linear model. future research on tie strength estimation and will contribute fur-
ther in the development of more complex and more realistic ap-
RMSE MAE
proaches for relationship modeling in the context of SM networks.
Model R1 0.2396202 0.1866057
Linear Model 0.1626065 0.1384879
Acknowledgments

The authors would like to thank the editor and two anonymous
show that the analysis presented in this research can be easily reviewers for their constructive and insightful comments that im-
reapplied to different contexts and is extremely adaptable to dif- proved the quality of the paper.
ferent situations, depending on the availability of proxy variables
or real tie strength values. References

[1] M.D. Choudhury, H. Sundaram, A. John, D.D. Seligmann, Analyzing the dynam-
6. Conclusions and future work ics of communication in online social networks, in: Handbook of SN Technolo-
gies and Applications, 2010, pp. 59–94.
In this paper we have studied the viability of several tie strength [2] M. Jamali, M. Ester, Using a trust network to improve top-n recommenda-
tion, in: International Conference on Recommender Systems, RecSys ’09, 2009,
models and shown empirically that (1) they provide a lower esti- pp. 181–188.
mation error than other approaches presented in the literature, (2) [3] B.O. Holzbauer, B.K. Szymanski, T. Nguyen, A. Pentland, Social ties as predictors
they are more general and, as a consequence, (3) they are more of economic development, in: International Conference and School on Network
Science, Springer, 2016, pp. 178–185.
reusable. Our models verify Granovetter’s statement about the lin- [4] G.M. McGuire, W.T. Bielby, The variable effects of tie strength and social re-
earity of tie strength’s components: amount of time, intensity, inti- sources: how type of support matters, Work Occup. 43 (1) (2016) 3–74.
macy and reciprocal services. Besides, generic domain independent [5] H. Liang, K.-W. Fu, Network redundancy and information diffusion: the impacts
of information redundancy, similarity, and tie strength, Commun. Res. (2016)
and easily detachable predictors have been proposed for each com- 1–23.
ponent. Additionally, the research includes an explanatory analysis [6] J. Heidemann, M. Klier, F. Probst, Online social networks: A survey of a global
of the relevance of each component and of several variables that phenomenon, Comput. Netw. 56 (18) (2012) 3866–3878.
[7] J. Golbeck, Generating predictive movie recommendations from trust in social
could be related to the tie strength. To the best of the authors’
networks, in: International Conference on Trust Management, iTrust ’06, 2006,
knowledge, our explanatory analysis is the first one to study the pp. 93–104.
impact of each predictor on the tie strength estimation. Regard- [8] L. Quijano-Sánchez, B. Díaz-Agudo, J.A. Recio-García, Development of a group
ing knowledge acquisition, this research has outlined several in- recommender application in a social network, Knowledge Based Syst. 71 (2014)
72–85.
teresting conclusions related to the close association between tie [9] V. Arnaboldi, A. Guazzini, A. Passarella, Egocentric online social networks:
strength and general persuasiveness, the ability to borrow money Analysis of key features and prediction of tie strength in facebook, Comput.
and the ability to recommend items. This could serve as starting Commun. 36 (10-11) (2013) 1130–1144.
[10] E. Gilbert, K. Karahalios, Predicting tie strength with social media, in: Inter-
points for recommender systems, marketing campaigns or other national Conference on Human Factors in Computing Systems, CHI ’09, 2009,
researchers in general. Finally, an example of the applicability of pp. 211–220.
the presented insights and models in different domains has been [11] S.S. Rodríguez, R.P.D. Redondo, A.F. Vilas, Y. Blanco-Fernández, J.J.P. Arias, A tie
strength based model to socially-enhance applications and its enabling imple-
illustrated through a case study of a financial network comprised mentation: mysocialsphere, Expert Syst. Appl. 41 (5) (2014) 2582–2594.
of clients operations. [12] H.C. White, Identity and Control: How Social Formation Emerge, Princeton Uni-
Our tests show that the best estimation model proposed in this versity Pres, 2008.
[13] A. Socievole, F.D. Rango, A.C. Caputo, Opportunistic mobile social networks:
paper, Model Stepwise, is capable of approximating the tie strength
from mobility and facebook friendships to structural analysis of user social be-
evaluations provided by the participants with a low margin of er- havior, Comput. Commun. 87 (2016) 1–18.
ror (Validation RMSE of 0.1386, approx.). Despite the positive re- [14] R. Xiang, J. Neville, M. Rogati, Modeling relationship strength in online social
networks, in: International Conference on World Wide Web, WWW’10, 2010,
sults, a number of potential improvements are still possible. For
pp. 981–990.
instance, future research should study the impact of factors that [15] X.H. Jiliang Tang, H. Liu, Social recommendation: a review, Soc. Netw. Anal.
characterize in detail the structure of the network. Given the cur- Min. 3 (4) (2013) 1113–1133.
rent limitations imposed by commercial SN, these variables might [16] M.O. Nachawati, R. Rabbi, G.E. Yu, L. Kerschberg, A. Brodsky, Social sifter: an
agent-based recommender system to mine the social web, in: International
be difficult to obtain or extremely costly. An evaluation of the Conference on Semantic Technologies for Intelligence, Defense, and Security,
trade-off between this cost and the real usefulness of the variables STIDS’12, 2012, pp. 125–128.
74 F. Liberatore, L. Quijano-Sanchez / Computer Communications 110 (2017) 59–74

[17] J. Neville, Ö. Simsek, D.D. Jensen, J. Komoroske, K. Palmer, H.G. Goldberg, Us- [32] T. Hossmann, G. Nomikos, T. Spyropoulos, F. Legendre, Collection and analy-
ing relational knowledge discovery to prevent securities fraud, in: Interna- sis of multi-dimensional network data for opportunistic networking research,
tional Conference on Knowledge Discovery and Data Mining, SIGKDD’05, 2005, Comput. Commun. 35 (13) (2012) 1613–1625.
pp. 449–458. [33] E. Huszti, B. Dávid, K. Vajda, Strong tie, weak tie and in-betweens: a con-
[18] P. Domingos, M. Richardson, Mining the network value of customers, in: In- tinuous measure of tie strength based on contact diary datasets, in: Interna-
ternational Conference on Knowledge Discovery and Data Mining, SIGKDD’01, tional Conference on Applications of Social Network Analysis, ASNA ’13, 2013,
2001, pp. 57–66. pp. 38–61.
[19] R.I.M. Dunbar, V. Arnaboldi, M. Conti, A. Passarella, The structure of online so- [34] L. Pappalardo, G. Rossetti, D. Pedreschi, ”how well do we know each other?”
cial networks mirrors those in the offline world, Soc. Netw. 43 (2015) 39–47. detecting tie strength in multidimensional social networks, in: International
[20] A. Petroczi, T. Nepusz, F. Bazso, Measuring tie-strength in virtual social net- Conference on Advances in Social Networks Analysis and Mining, ASONAM’12,
works, Connections 27 (2) (2007) 39–52. 2012, pp. 1040–1045.
[21] A. Jøsang, R. Ismail, C. Boyd, A survey of trust and reputation systems for on- [35] P.V. Marsden, K.E. Campbell, Measuring tie strength, Soc. Forces 63 (2) (1984)
line service provision, Decis. Support Syst. 43 (2) (2007) 618–644. 482–501.
[22] P. Massa, P. Avesani, Trust-aware recommender systems, in: International Con- [36] M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi, Founda-
ference on Recommender Systems, RecSys ’07, 2007, pp. 17–24. tions of multidimensional network analysis, in: Advances in Social Networks
[23] W.-P. Lee, C. Kaoli, J.-Y. Huang, A smart tv system with body-gesture control, Analysis and Mining (ASONAM), 2011 International Conference on, IEEE, 2011,
tag-based rating and context-aware recommendation, Knowl. Based Syst. 56 pp. 485–489.
(2014) 167–178. [37] M.E. Walker, S. Wasserman, B. Wellman, Statistical models for social support
[24] N.E. Friedkin, Test of structural features of Granovetter’s strength of weak ties networks, in: Advances in Social Network Analysis: Research in the Social and
theory, 2, Soc. Netw., 1980. Behavioral Sciences, 1994, pp. 53–79.
[25] X. Shi, L. Adamic, M. Strauss, Networks of strong ties, Physica A 378 (1) (2007) [38] D. Liben-Nowell, J.M. Kleinberg, The link-prediction problem for social net-
33–47. works, JASIST 58 (7) (2007) 1019–1031.
[26] N. Lin, P. Dayton, P. Reenwald, Analyzing the instrumental use of relations in [39] B. Taskar, M.F. Wong, P. Abbeel, D. Koller, Link prediction in relational
the context of social structure, Sociol. Methods Res. 7 (2) (1978) 149–166. data, in: Advances in Neural Information Processing Systems, NIPS’03, 2003,
[27] M.S. Granovetter, The strength of weak ties, Am. J. Sociol. 78 (6) (1973) pp. 659–666.
1360–1380. [40] D.Z. Levin, R. Cross, L.C. Abrams, Why should I trust you? predictors of inter-
[28] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, N. Christakis, Tastes, ties, and personal trust in a knowledge transfer context, Acad. Manage. Meeting, 2002.
time: a new social network dataset using facebook.com, Soc. Netw. 30 (4) [41] J. Pagès, Analyse factorielle de données mixtes, Revue de statistique appliquée
(2008) 330–342. 52 (4) (2004) 93–111.
[29] A. Madan, M. Cebrian, S. Moturu, K. Farrahi, et al., Sensing the” health state” [42] K. Berger, J. Klier, M. Klier, A. Richter, ”who is key...?” - characterizing value
of a community, IEEE Pervasive Comput. 11 (4) (2012) 36–45. adding users in enterprise social networks, in: European Conference on Infor-
[30] R.S. Burt, Structural Holes: The Social Structure of Competition., Harvard Uni- mation Systems, ECIS ’14, 2014.
versity Press, 1992. [43] L. Quijano-Sanchez, F. Liberatore, The big chase: a decision support system for
[31] N. Lin, W.M. Ensel, J.C. Vaughn, Social resources and strength of ties: struc- client acquisition applied to financial networks, Decis. Support Syst. 98 (2017)
tural factors in occupational status attainment, Am. Sociol. Rev. 46 (4) (1981) 49–58.
393–405.

You might also like