Mathematic Al Contest in Modeling (MCM/ICM) Summary Sheet
Mathematic Al Contest in Modeling (MCM/ICM) Summary Sheet
2014
Mathematical Contest in Modeling (MCM/ICM) Summary Sheet
The famous ‘80-20’ rule states that the 80% influence is caused by the 20% for many
events. This principle also applies to the network science: only few nodes have a
significant influence and impact to the whole network. In our paper, A Relation
Distance Model and an Authority-Popularity Evaluation Model are employed to
measure of the 20% and analysis of its influence.
For requirement 1 & 2, we construct the undirected co-author network based on the
511 order relationship matrix. A Relation Distance Model is proposed on the basis of
SNA technique. It combines three centrality indexes as a measure vector to calculate
the ‘distance’ with the most influential node. Another measure (Eigenvector
centrality) which takes both degree and the influence of co-authors into
consideration outputs a new rank. Validation of the model is discussed by
comparing the two ranking of the top 15 authors in Erdos1 network, we find ALON,
NOGA M. is the most influential person in the Erdos1 network. The degree
distribution of the Erdos1 network is proved to approximately be power-law
distribution, which indicates it is a scale-free network.
更多数学建模资料请关注微店店铺“数学建模学习交流”
https://k.weidian.com/RHO6PSpA
Team # 25318 Page 1 of 20
1. Introduction
In this paper, we will introduce two proper methods to analyze influence and
impact in research networks and other areas of society.
The philosophy of the scale-free network is revealed and the shortcut to boost
influence based on our model is proposed in section 6.
2. Basic Assumptions
We believe that the accurate strength level of co-authorship between two ar-
bitrary Erdos1 authors is hard to be measured by some criterion. For the sake
of simplification, in our case, if two authors co-author a paper , then the co-
authorship index is ’1’, if not , it is ’0’.
We believe that the more often a paper is cited or the earlier a paper is pub-
lished, the more influential it is. And also, a high influential journal contributes
more to the influence of a paper.
3. Definitions
Step 1. We extract the 511 Erdos1 authors from over 18,000 lines of raw data
in Erdos1 file , which is a easy task by eliminating names without a date followed
by in Microsoft Excel 2010.
Step 2. In order to obtain the relationship among the 511 Erdos1, we leave
out the Erdos2 from the list of co-authors of the 511 Erdos1 by using library
function ’countif()’ in Microsoft Excel 2010.
...
4.2.1 Overview
In order to build the Erdos1 network and analyze its properties, the social
network analysis (SNA) technique is employed. Social Network Analysis refers
to methods used to analyze social networks structures made up of individu-
als called ’nodes’, which are tied (connected) by one or more specific types of
interdependency. In our case, the Erdos1 authors are viewed as nodes and co-
authorship (obtained from subsection 4.1) as links among them.
Team # 25318 Page 4 of 20
Figure 1: The co-author network of the Erdos1. For the sake of observability, we choose the
top 50 Erdos1 authors in the priority list in subsection 4.2.2.
4.2.2 Methodology
In the following, we will finish the steps to build up and validate the model.
−
→
Step 1. Calculate the centrality measure vector Ax of the nodes in the net-
work.
where n is the total number of nodes in a network, and aix is a variable indicat-
ing the weighted number of co-authorship between nodes x and i. According to
Assumption 1, in our Erdos1 network, aix = 1 or 0 , for all i.
n P
P n
Cb (x) = gij (x)
x j
where gij (x) indicates whether the shortest path between two other nodes i and
j passes through the node x.
where lix is the length of the shortest path connecting nodes i and x. The shortest
paths are calculated based on the Floyd algorithm.
• Degree centrality shows the number of nodes’ connections, which also re-
flects connectivity of nodes in a network. Nodes with high connectivity
can be viewed as more influential.
• Betweenness centrality shows the number of the shortest paths that pass-
ing by certain node. It also reveals the dependency of a node from other
nodes. Obviously, if a node is dependent by other nodes quite much, the
node must be very important for the whole network, That is, large be-
tweenness centrality value of a node is equivalent to its high importance
in the network.
• Closeness centrality actually measures how far away one node is from oth-
er nodes. Apparently that small closeness value of a node reflects its high
importance.
According to the above, we define a vector which contains the three measures
as the form below:
−
→ Cd (x) Cb (x) Cc (x)
Ax = ( , , )
maxCd (x) maxCb (x) maxCc (x)
−
→
Ax is called ’measure vector’. It can also be represented in another from below:
−
→
Ax = (Ax1 , Ax2 , Ax3 )
−
→
where Axi (i=1,2,3) stands for element i in the vector Ax . The three elements are
all divided by their maximum values to be normalized, separately. According
to the definition of three centralities , it is obvious that Ax will get its optimal
value when degree(Ax1 ) and betweenness (Ax2 ) get to their largest value 1 and
closeness (Ax3 ) gets to its smallest value 0.
Team # 25318 Page 6 of 20
Therefore, the ideal model of an author who has the most significant influ-
−
→
ence within the network will have his/her own measure vector Ax as (1, 1, 0).
−→
The ideal vector will be defined as AC :
−→
AC = (AC C C
1 , A2 , A3 ) = (1, 1, 0)
This distance is the key to determine who in the Erdos1 network has signif-
icant influence within the network. A further distance apparently indicates a
lower possibility of being an author who has significant influence. The Erdos1
authors will be arranged into an Influence and Impact priority list. The order of
the list is arranged according to the value of DC (x). Node with smaller DC (x)
value will rank higher in the priority list since DC (x) is the distance from the
node to an ideal ’most significant’ node.
The top 15 authors of the Influence and Impact priority list are shown below
in Table 2. For the sake of observability, we choose the top 50 Erdos1 authors in
the impact priority list to draw the network in UCINET, as is shown in Figure 1.
We can see from Table 2 that HARARY, FRANK* has the most significant
influence within the co-author network. When it comes to a specific university,
department, or a journal, the relationship matrix data should be collected for the
network. The centrality index can be calculated and a ranking can be obtained
from our model. Weight of nodes or links may be added if necessary, and we
will discuss these network in section 5.
Team # 25318 Page 7 of 20
Figure 2: Measure vectors of 511 Erdos1 authors, the big red point(1,1,0) stands for the ideal
vector of the most crucial one.
Eigenvector centrality
For the second question in Requirement 2 ,now we consider who has pub-
lished important works or connects important researchers within Erdos1.
Calculate the 511 nodes’ eigenvector centrality value in UCINET , we get the
top 15 nodes as is shown below in Table 3.
We can see from Table 3 that ALON, NOGA M. is the person who connects
more important researchers within Erdos1 than others .
Comparing the two kinds of ranking methods, the majority of the authors in
the first table are also in the second table. However, some ’specific’ authors in
the first table cannot find their name in the second top list. Why? Next, we will
focus our discussion on the problem by the case of HARARY, FRANK*.
The degree of HARARY, FRANK* is the maximum 44, which means he coau-
thored with 44 other Erdos1 authors. When we study the relatively most influ-
ential authors in the network, we discover that only 30 authors are the ’master’
within the network. That is to say, most of the co-authors of HARARY, FRANK*
Team # 25318 Page 9 of 20
are not high influential authors. So, when we consider both degree and the in-
fluence of his/her co-authors, authors whose degree ranks high may not be the
’master’ within the network. So, the weakness of the Relation Distance Model is
being susceptible to large degree value.
In subsection 4.2, we have got each node’s degree centrality. So, the degree
distribution of the network can be obtained. Now, we focus our discussion on
the degree distribution of the Erdos1 network. According to an algorithm com-
bined maximum-likelihood fitting methods with goodness-of-fit tests proposed
by Clauset [2007] , we discover that the distribution of degree centrality k in the
Erdos1 network is approximate a power-law distribution, the power-law expo-
nent is estimated as 1.6.
P (K > k) ∼ k −1.6
The degree distribution of the Erdos1 network and the approximative power-
law distribution is shown below in Figure 3.
Figure 3: The degree distribution of the Erdos1 network and the approximative power-law
distribution.
No.18: Bollob’as, B., Random Graphs, Academic Press, New York, 2nd ed.,2001.
No.20: Snijders, T. A. B. Markov chain Monte Carlo estimation of exponential random graph
models. Journal of Social Structure 3(2), 2002.
No.21: Watts, D. J., Small Worlds , Princeton University Press, Princeton ,1999.
No.23: Luis A Nunes Amaral, Antonio Scala, Marc Barthlmy, H Eugene Stanley. Classes of
small-world networks, Proceedings of the National Academy of Sciences,2000.
No.24: M. Barthlmy, Small-world networks: Evidence for a crossover picture Physical Re-
view Letters, 1999.
• Explanation 3: For the sake of description, we give each paper an ID. ’1’
for the first paper in NetSciFoundation.pdf, ’2’ for the second ,’17’ for the
first paper in the additional list, etc.
work. The Modified PageRank Algorithm includes two steps: Initialization and
Iteration.
Step 1: Initialization
In our citation network, the initial PR value of each paper is defined as the nor-
malized impact factor of the journal in which the paper is published. The abso-
lute impact factors value (collected from the web [6] ) with ID of each paper are
listed in Table 4. For instance, the weight of paper No.4 ’Emergence of scaling in
random networks’ published in Science is equal to the Impact Factor of Science
31.027.
P 24
P
where (i) = Gji
j=1
Step 2: Iteration
X P Rj (k) 1−q
P Ri (k) = q + i = 1, 2, . . . , n
j
L(j) n
The result shows that the No.18 paper: Random Graphs, wrote by Bollobs,
has the highest PageRank. So it is the most authoritative paper among 24 paper-
s. Authority reflects the depth of influence. To find the most influential paper
we have to get the width of the influence: popularity. We consider to use cited
quantities to inflect the popularity in the citation network. Since the cited quan-
tities related to the years that paper were published, so we set cited quantities
per year as the parameter, which is shown in Table 6 .
Since the weight value is added to the link instead of node in co-star net-
work, the PageRank algorithm here has to be improved. By the origin PageR-
ank algorithm, the PageRank value of node i will be distributed to nodes which
are pointed to from i evenly in every step of the iteration. In our improved
PageRank algorithm, how much PageRank value that node i distribute to nodes
that i point to is determined by the weight (IMDb rating[1] of the movie) of the
link. The weight is proportional to PageRank value that is distributed. This is
reasonable because excellent movie can make movie stars more influential. For
example, the weight of the link(i to k) is 3, the weight of the link(i to n) is 4.
3
So the PageRank value that i distribute to k is 3+4 × 1 and the PageRank that i
4
distribute to n is 3+4 × 1, as is shown in Figure 8.
Team # 25318 Page 16 of 20
Gj = [0 1 0 1 0]T
So we get
3 6 4 8
G02j = (1 + 1) = G04j = (1 + 1) =
3+4 7 3+4 7
6 8 T
G0j = [0 0 0]
7 7
Other steps are same as in subsection 5.2. We also implement the improved
PageRank algorithm in MATLAB@2012b and result is as below in Table 7.
Table 7: PR value and number of Google search results for each star.
X P Rj (k) 1−q
P Ri (k) = q + i = 1, 2, . . . , n
j
L(j) n
star. But there are always some movies do not need the most famous actor, the
movie stars all have the opportunity to be the starring. This situation is reflected
by coefficient q. So value of q can influence the PageRank of the network. q is de-
faulted be 0.85, we set the PageRank for q = 0.85 as standard. With q changes, we
get the corresponding PageRank and analyze the similarity between this PageR-
ank and the standard. The results were shown in Figure 10.
The results show that in a certain range, the PageRank change slowly with
the change of q. So we can see that our model has high stability.
Strengths
• Simplicity and Accuracy: The programs of the model are easy to under-
stand, and the calculations are precise.
Weaknesses
Barabsi and Albert [1999] pointed out that ER random graph and WS small
world model neglect two important characteristics of actual network.
The dynamics in the scale-free network is the philosophy behind the famous
Pareto principle[9] (also known as the 80-20 rule), which states that, for many
events, roughly 80% of the effects come from 20% of the causes.
References
[1] http://www.imdb.com/
[4] http://en.wikipedia.org/wiki/Centrality
[6] Barabsi, A-L, and Albert, R. Emergence of scaling in random networks. Sci-
ence, 286:509-512, 1999.
[7] http://www.impactfactorsearch.com
[8] http://www.cs.virginia.edu/oracle/
[9] http://en.wikipedia.org/wiki/Pareto_principle