SlideShare a Scribd company logo
EXPLORING ARTICLE NETWORKS
ON WIKIPEDIA WITH NODEXL
PRESENTATION DESCRIPTION
• With 4.8 million articles in the English version of Wikipedia, this crowd-sourced online
encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to
source for a first read on a topic. The open-source and free Network Overview, Discovery
and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the
capture of “article networks” from Wikipedia. Such content network analysis-based data
visualizations enable the development of research leads; some understandings of public
conceptualizations of related concepts, peoples, events, and phenomena; the profiling of
Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will
showcase this affordance of NodeXL and provide some ideas for practical applications of this
channel of research and knowing.
2
OVERVIEW
• Wikipedia ethos and practices
• Wikipedia
• The many Wikipedias; the English Wikipedia
• The Wikimedia Foundation
• MediaWiki and basic functionalities
• Basic article network analysis
• NodeXL and basic functionalities; automation
3
OVERVIEW (CONT.)
• http page networks on Wikipedia:
• article networks
• human author / editor networks
• robot networks
• Live demos
• Other (future) networks from Wikipedia
4
WIKIPEDIA ETHOS AND PRACTICES
• Objective, fact-based, and
research-focused
• Full research citations
• Isolating of opinions into Talk pages
• Open
• Open-access
• Open-source, public domain-released
• Crowd-sourced knowledge co-
creation; curated public data
• Crowd-funded 501(C)3; transparent
finances ($58.5 million goal for FY
2015)
• Editing via email-verified accounts
or Internet Protocol (IP) capture
5
WIKIPEDIA
THE MANY WIKIPEDIAS
• 288 Wikipedias (with 277 active)
• In order of articles: English (13.9%),
Swedish (5.6%), Dutch (5.2%), German
(5.25%), French (4.6%), Waray-Waray
(3.6%), Russian (3.5%), Cebuano
(3.4%), Italian (3.4%), Spanish (3.4%),
and Other (48.2%)
• (“List of Wikipedias” on Wikipedia)
THE ENGLISH WIKIPEDIA
• Founded in Jan. 15, 2001
• 4.8 million articles
• 25 million user accounts
• 1.347 administrators (“English
Wikipedia” on Wikipedia)
6
THE WIKIMEDIA FOUNDATION
• Objective: to encourage “the growth, development and distribution of free,
multilingual, educational content,” and to provide “the full content of these
wiki-based projects to the public free of charge”
• A range of projects: Wikipedia, Wikibooks, Wikiversity, Wikimedia
Commons, Wiktionary, Wikiquote, Wikivoyage, Wikidata, Wikinews,
Wikisource, Wikispecies, and MediaWiki (Wikimedia Foundation)
7
MEDIAWIKI AND BASIC FUNCTIONALITIES
• “wiki wiki”: “quick” or “fast” in Hawaiian
• Ward Cunningham as the developer of the first wiki software (WikiWikiWeb) in 1994 to
enable online collaborations with history versioning and rollback capabilities
• MediaWiki first created by the Wikimedia Foundation in 2002
• Magnus Manske and Lee Daniel Crocker were the initial developers of this tool using PHP
(MediaWiki)
8
A WIKIMEDIA ARTICLE INTERFACE
9
A VIEW OF THE REVISION HISTORY
10
BASIC ARTICLE NETWORK ANALYSIS
• Basics of network graphs: nodes-links, entities-relationships, vertices-edges;
undirected or directed (digraphs) graphs; networks and meta-networks;
subgraphs and clusters, motifs; network centrality
• Direct ties represented in ego neighborhoods (with a maximum geodesic
distance or graph diameter of 2); also 1.5 degree ties for transitivity (with a
maximum geodesic distance or graph diameter of 3) and 2 degree ties to
include networks of the respective “alters” (with much larger maximum
geodesic distances possible)
11
BASIC ARTICLE NETWORK ANALYSIS (CONT.)
• Entities may be individuals or groups, contents, and other elements
• Relatedness: Article networks created based on in-links and outlinks; node
“degree”
• Other types of relatedness are possible such as based on word co-occurrences, title
relatedness (same synset or “synonym set”), shared categories, and others
• Relations are conceptualized as enabling paths
12
NODEXL AND BASIC FUNCTIONALITIES;
AUTOMATION
• A free and open-source add-on to Microsoft Excel available on the Microsoft
CodePlex platform
• Enables…
• Graph visualization (with datasets from UCINET, GraphML, and other types)
• Data extraction from a number of social media platform APIs; refreshed runs based on
the same parameters (macros)
• Large number of tools of graph analysis
• A number of layout algorithms and selections to represent the data visually
13
HTTP PAGE NETWORKS ON WIKIPEDIA
(IN THIS CASE)
• http page links within Wikipedia, not connecting out to the Surface Web
• One-directional (outlink) directional graph of the target Wikipedia page
• May include article page networks, human page networks, robot page networks, and
others
• Networks seeded by one target title or name (as long as the string appears as a
page in Wikipedia)
• No need for an application programming interface (API) on the MediaWiki platform
14
MEDIAWIKI
ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 237 VERTICES, 237
EDGES)
15
MEDIAWIKI ARTICLE
NETWORK ON
WIKIPEDIA
(1.5 DEG., 12,368 VERTICES AND
17,686 UNIQUE EDGES)
16
MEDIAWIKI
ARTICLE
NETWORK ON
WIKIPEDIA
(2 DEG., 923,006 VERTICES)
17
In the first run, the software
kicked up an “out of memory”
exception error and crashed.
Another run was conducted on a
different machine with more
processing capability. The
screenshots are from that data
extraction. The data itself
involved some edge pairs (over
half a dozen) in which one of the
vertices was missing.
EXAMPLE: ARTICLE NETWORK
• Who are individuals related to a topic? Events? Years? Topics? Which of
these may be useful leads to learn more about the basic seed topic?
• Based on a real-world individual, what is he or she known for? Who are
people that this person is connected with?
• Based on a technology, when was it originated? Who originated it? What
were precursor inventions? What inventions were linked to the particular
technology?
18
EXAMPLE: ARTICLE NETWORK (CONT.)
• Based on collected lists, who is on a target list, and for what?
• Based on a particular topic, are there gaps in the information based on
“missing” article links?
• Based on a particular phenomena, event, phrase, or individual, in a foreign
context and foreign language, what may be learned?
19
WIKI ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 162 VERTICES)
20
WEB_LOG_
ANALYSIS_
SOFTWARE
ARTICLE
NETWORK
ON
WIKIPEDIA
(1 DEG., 13 VERTICES)
21
EXAMPLE: HUMAN (AUTHOR / EDITOR) USER
NETWORK
• Based on the human user’s network on Wikipedia, what articles does he or she
tend to edit? In total, what does this network suggest about the person behind
the edits?
• (This requires the existence of a user page though.)
22
USER:LWEDEKIND
NETWORK ON
WIKIPEDIA
(1 DEG., 9 VERTICES)
23
USER:THIS_LOUSY_
T-SHIRT ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 30 VERTICES)
24
EXAMPLE: ROBOT NETWORK
• Based on the approved robot user’s network, what are the interests of the
maker of the robot? What other accounts is the robot connected to?
25
USER:OGREBOT
NETWORK ON
WIKIPEDIA
(1 DEG., 5 VERTICES)
26
USER:EMAUSBOT
NETWORK ON
WIKIPEDIA
(1 DEG., 2 VERTICES)
27
ADDITIONAL APPROACHES
• Chaining from one target account to related others
• Cross-comparing information on the Wikipedia site with the extracted
networks
• Connecting the Wikipedia information with related sites on the Surface Web /
World Wide Web (WWW) and Internet
28
OTHER (FUTURE) NETWORKS FROM WIKIPEDIA
• The third-party tool to NodeXL has spaces to enable user-content (two-mode)
network extractions and the mapping of co-editing networks…but those
functions are not currently enabled (apparently)
29
DISCUSSIONS
• Questions?
• Ideas for research?
30

More Related Content

What's hot (20)

PDF
OSINT Social Media Techniques - Macau social mediat lc
Cyber Threat Intelligence Network
 
KEY
Enterprise Open Source Intelligence Gathering
Tom Eston
 
PPTX
OSINT Tool - Reconnaissance with Maltego
Raghav Bisht
 
PDF
Social Media Analysis... according to Net7
Net7
 
PPTX
Hacker tool talk: maltego
Chris Hammond-Thrasher
 
PDF
What Your Tweets Tell Us About You, Speaker Notes
KrisKasianovitz
 
PDF
30 Tools and Tips to Speed Up Your Digital Workflow
Mike Kujawski
 
PDF
Shibboleth: Open Source Distributed Authentication and Authorization
Glen Newton
 
PDF
Gates Toorcon X New School Information Gathering
Chris Gates
 
PPT
Social Data and Multimedia Analytics for News and Events Applications
Yiannis Kompatsiaris
 
PDF
OSINT- Leveraging data into intelligence
Deep Shankar Yadav
 
PPTX
Data mining for social media
rangesharp
 
PPT
Microsoft Research Cambridge 20071207 Workshop On Online Social Networks (T...
Tin180 VietNam
 
PDF
SemTech West 2011 - Digital Provenance
gvj4v
 
ODT
Riding The Semantic Wave
Kaniska Mandal
 
PPTX
Data Science Workflow
Aseel Addawood
 
PPT
Owasp osint presentation - by adam nurudini
Adam Nurudini
 
PPTX
A Return on Investment: Making the data work harder
Jane Stevenson
 
PPTX
IoTA : Where IoT Meets Social Network
Setareh Sarachi, MSc.
 
PPT
Archives 2.0, the Archives Hub and AIM25
Jane Stevenson
 
OSINT Social Media Techniques - Macau social mediat lc
Cyber Threat Intelligence Network
 
Enterprise Open Source Intelligence Gathering
Tom Eston
 
OSINT Tool - Reconnaissance with Maltego
Raghav Bisht
 
Social Media Analysis... according to Net7
Net7
 
Hacker tool talk: maltego
Chris Hammond-Thrasher
 
What Your Tweets Tell Us About You, Speaker Notes
KrisKasianovitz
 
30 Tools and Tips to Speed Up Your Digital Workflow
Mike Kujawski
 
Shibboleth: Open Source Distributed Authentication and Authorization
Glen Newton
 
Gates Toorcon X New School Information Gathering
Chris Gates
 
Social Data and Multimedia Analytics for News and Events Applications
Yiannis Kompatsiaris
 
OSINT- Leveraging data into intelligence
Deep Shankar Yadav
 
Data mining for social media
rangesharp
 
Microsoft Research Cambridge 20071207 Workshop On Online Social Networks (T...
Tin180 VietNam
 
SemTech West 2011 - Digital Provenance
gvj4v
 
Riding The Semantic Wave
Kaniska Mandal
 
Data Science Workflow
Aseel Addawood
 
Owasp osint presentation - by adam nurudini
Adam Nurudini
 
A Return on Investment: Making the data work harder
Jane Stevenson
 
IoTA : Where IoT Meets Social Network
Setareh Sarachi, MSc.
 
Archives 2.0, the Archives Hub and AIM25
Jane Stevenson
 

Viewers also liked (20)

PDF
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
Shalin Hai-Jew
 
PDF
LIWC-ing at Texts for Insights from Linguistic Patterns
Shalin Hai-Jew
 
PDF
Sentiment Analysis with NVivo 11 Plus
Shalin Hai-Jew
 
PDF
Formations & Deformations of Social Network Graphs
Shalin Hai-Jew
 
PDF
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Shalin Hai-Jew
 
PPTX
LIWC Dictionary Expansion
Luiz Aoqui
 
PDF
Expert Perceptions of the Feasibility of MOOCs
Shalin Hai-Jew
 
PDF
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Shalin Hai-Jew
 
PDF
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
Shalin Hai-Jew
 
PDF
Using Qualtrics for Online Trainings
Shalin Hai-Jew
 
PDF
Exploring Social Media with NodeXL
Shalin Hai-Jew
 
PDF
Building Surveys in Qualtrics for Efficient Analytics
Shalin Hai-Jew
 
PDF
Matrix Queries and Matrix Data Representations in NVivo 11 Plus
Shalin Hai-Jew
 
PDF
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Shalin Hai-Jew
 
PDF
Using Qualtrics to Create Automated Online Trainings
Shalin Hai-Jew
 
PDF
Writing and Publishing about Applied Technologies in Tech Journals and Books
Shalin Hai-Jew
 
PDF
Building a Digital Learning Object w/ Articulate Storyline 2
Shalin Hai-Jew
 
PDF
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Shalin Hai-Jew
 
PDF
Designing Online Learning to Actual Human Capabilities
Shalin Hai-Jew
 
PDF
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
Shalin Hai-Jew
 
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
Shalin Hai-Jew
 
LIWC-ing at Texts for Insights from Linguistic Patterns
Shalin Hai-Jew
 
Sentiment Analysis with NVivo 11 Plus
Shalin Hai-Jew
 
Formations & Deformations of Social Network Graphs
Shalin Hai-Jew
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Shalin Hai-Jew
 
LIWC Dictionary Expansion
Luiz Aoqui
 
Expert Perceptions of the Feasibility of MOOCs
Shalin Hai-Jew
 
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Shalin Hai-Jew
 
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
Shalin Hai-Jew
 
Using Qualtrics for Online Trainings
Shalin Hai-Jew
 
Exploring Social Media with NodeXL
Shalin Hai-Jew
 
Building Surveys in Qualtrics for Efficient Analytics
Shalin Hai-Jew
 
Matrix Queries and Matrix Data Representations in NVivo 11 Plus
Shalin Hai-Jew
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Shalin Hai-Jew
 
Using Qualtrics to Create Automated Online Trainings
Shalin Hai-Jew
 
Writing and Publishing about Applied Technologies in Tech Journals and Books
Shalin Hai-Jew
 
Building a Digital Learning Object w/ Articulate Storyline 2
Shalin Hai-Jew
 
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Shalin Hai-Jew
 
Designing Online Learning to Actual Human Capabilities
Shalin Hai-Jew
 
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
Shalin Hai-Jew
 
Ad

Similar to Exploring Article Networks on Wikipedia with NodeXL (20)

PDF
Analyzing Multidimensional Networks within MediaWikis
Brian Keegan
 
PPTX
Aswc2009 Smw Tutorial Part 1 Intro And Examples
Jesse Wang
 
PDF
Overview of the Research in Wimmics 2018
Fabien Gandon
 
PDF
On the many graphs of the Web and the interest of adding their missing links.
Fabien Gandon
 
PPTX
Knowledge Technologies: Opportunities and Challenges
Fariz Darari
 
PDF
Weeki - Wikipedia <- tweets
Ma Thomas
 
PPT
Analyzing social media networks with NodeXL - Chapter-15 Images
Marc Smith
 
PPT
A Survey of the Landscape and State-of-Art in Semantic Wiki
Max Völkel
 
PPTX
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
PPT
BioWikis BSB10
Dan Bolser
 
PDF
Applying And Extending Semantic Wikis For Semantic Web Courses
Alicia Buske
 
PPTX
Semantic Wiki: Social Semantic Web in Use
Jesse Wang
 
PPTX
Semantic Wiki: Social Semantic Web In Action:
Jesse Wang
 
PPT
Wikis as Social Networks: Evolution and Dynamics
Ralf Klamma
 
PPTX
20120301 strata-marc smith-mapping social media networks with no coding using...
Marc Smith
 
PPTX
Pre-SMWCon Spring 2012 meetup (short)
Jesse Wang
 
PDF
From Wikis to Knowledge Graphs
Heiko Paulheim
 
PDF
Getting Started with Knowledge Graphs
Peter Haase
 
PPTX
Tutorial semantic wikis and applications
Mark Greaves
 
PPT
Wikis at work
Dan Bolser
 
Analyzing Multidimensional Networks within MediaWikis
Brian Keegan
 
Aswc2009 Smw Tutorial Part 1 Intro And Examples
Jesse Wang
 
Overview of the Research in Wimmics 2018
Fabien Gandon
 
On the many graphs of the Web and the interest of adding their missing links.
Fabien Gandon
 
Knowledge Technologies: Opportunities and Challenges
Fariz Darari
 
Weeki - Wikipedia <- tweets
Ma Thomas
 
Analyzing social media networks with NodeXL - Chapter-15 Images
Marc Smith
 
A Survey of the Landscape and State-of-Art in Semantic Wiki
Max Völkel
 
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
BioWikis BSB10
Dan Bolser
 
Applying And Extending Semantic Wikis For Semantic Web Courses
Alicia Buske
 
Semantic Wiki: Social Semantic Web in Use
Jesse Wang
 
Semantic Wiki: Social Semantic Web In Action:
Jesse Wang
 
Wikis as Social Networks: Evolution and Dynamics
Ralf Klamma
 
20120301 strata-marc smith-mapping social media networks with no coding using...
Marc Smith
 
Pre-SMWCon Spring 2012 meetup (short)
Jesse Wang
 
From Wikis to Knowledge Graphs
Heiko Paulheim
 
Getting Started with Knowledge Graphs
Peter Haase
 
Tutorial semantic wikis and applications
Mark Greaves
 
Wikis at work
Dan Bolser
 
Ad

More from Shalin Hai-Jew (20)

PDF
Academic Grant Pursuits Newsletter - January 2028
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - December 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - November 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - October 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - September 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - August 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - July 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - June 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - May 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - April 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - March 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - February 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - January 2027
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - December 2026
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - November 2026
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - October 2026
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - September 2026
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - August 2026
Shalin Hai-Jew
 
PDF
Academic Grant Pursuits Newsletter - July 2026
Shalin Hai-Jew
 
PDF
Imagining Future Granting in Academia.pdf
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - January 2028
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - December 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - November 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - October 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - September 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - August 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - July 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - June 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - May 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - April 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - March 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - February 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - January 2027
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - December 2026
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - November 2026
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - October 2026
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - September 2026
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - August 2026
Shalin Hai-Jew
 
Academic Grant Pursuits Newsletter - July 2026
Shalin Hai-Jew
 
Imagining Future Granting in Academia.pdf
Shalin Hai-Jew
 

Recently uploaded (20)

PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
DOCX
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PDF
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
big data eco system fundamentals of data science
arivukarasi
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Krezentios memories in college data.pptx
notknown9
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
covid 19 data analysis updates in our municipality
RhuAyungon1
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
 

Exploring Article Networks on Wikipedia with NodeXL

  • 1. EXPLORING ARTICLE NETWORKS ON WIKIPEDIA WITH NODEXL
  • 2. PRESENTATION DESCRIPTION • With 4.8 million articles in the English version of Wikipedia, this crowd-sourced online encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to source for a first read on a topic. The open-source and free Network Overview, Discovery and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the capture of “article networks” from Wikipedia. Such content network analysis-based data visualizations enable the development of research leads; some understandings of public conceptualizations of related concepts, peoples, events, and phenomena; the profiling of Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will showcase this affordance of NodeXL and provide some ideas for practical applications of this channel of research and knowing. 2
  • 3. OVERVIEW • Wikipedia ethos and practices • Wikipedia • The many Wikipedias; the English Wikipedia • The Wikimedia Foundation • MediaWiki and basic functionalities • Basic article network analysis • NodeXL and basic functionalities; automation 3
  • 4. OVERVIEW (CONT.) • http page networks on Wikipedia: • article networks • human author / editor networks • robot networks • Live demos • Other (future) networks from Wikipedia 4
  • 5. WIKIPEDIA ETHOS AND PRACTICES • Objective, fact-based, and research-focused • Full research citations • Isolating of opinions into Talk pages • Open • Open-access • Open-source, public domain-released • Crowd-sourced knowledge co- creation; curated public data • Crowd-funded 501(C)3; transparent finances ($58.5 million goal for FY 2015) • Editing via email-verified accounts or Internet Protocol (IP) capture 5
  • 6. WIKIPEDIA THE MANY WIKIPEDIAS • 288 Wikipedias (with 277 active) • In order of articles: English (13.9%), Swedish (5.6%), Dutch (5.2%), German (5.25%), French (4.6%), Waray-Waray (3.6%), Russian (3.5%), Cebuano (3.4%), Italian (3.4%), Spanish (3.4%), and Other (48.2%) • (“List of Wikipedias” on Wikipedia) THE ENGLISH WIKIPEDIA • Founded in Jan. 15, 2001 • 4.8 million articles • 25 million user accounts • 1.347 administrators (“English Wikipedia” on Wikipedia) 6
  • 7. THE WIKIMEDIA FOUNDATION • Objective: to encourage “the growth, development and distribution of free, multilingual, educational content,” and to provide “the full content of these wiki-based projects to the public free of charge” • A range of projects: Wikipedia, Wikibooks, Wikiversity, Wikimedia Commons, Wiktionary, Wikiquote, Wikivoyage, Wikidata, Wikinews, Wikisource, Wikispecies, and MediaWiki (Wikimedia Foundation) 7
  • 8. MEDIAWIKI AND BASIC FUNCTIONALITIES • “wiki wiki”: “quick” or “fast” in Hawaiian • Ward Cunningham as the developer of the first wiki software (WikiWikiWeb) in 1994 to enable online collaborations with history versioning and rollback capabilities • MediaWiki first created by the Wikimedia Foundation in 2002 • Magnus Manske and Lee Daniel Crocker were the initial developers of this tool using PHP (MediaWiki) 8
  • 9. A WIKIMEDIA ARTICLE INTERFACE 9
  • 10. A VIEW OF THE REVISION HISTORY 10
  • 11. BASIC ARTICLE NETWORK ANALYSIS • Basics of network graphs: nodes-links, entities-relationships, vertices-edges; undirected or directed (digraphs) graphs; networks and meta-networks; subgraphs and clusters, motifs; network centrality • Direct ties represented in ego neighborhoods (with a maximum geodesic distance or graph diameter of 2); also 1.5 degree ties for transitivity (with a maximum geodesic distance or graph diameter of 3) and 2 degree ties to include networks of the respective “alters” (with much larger maximum geodesic distances possible) 11
  • 12. BASIC ARTICLE NETWORK ANALYSIS (CONT.) • Entities may be individuals or groups, contents, and other elements • Relatedness: Article networks created based on in-links and outlinks; node “degree” • Other types of relatedness are possible such as based on word co-occurrences, title relatedness (same synset or “synonym set”), shared categories, and others • Relations are conceptualized as enabling paths 12
  • 13. NODEXL AND BASIC FUNCTIONALITIES; AUTOMATION • A free and open-source add-on to Microsoft Excel available on the Microsoft CodePlex platform • Enables… • Graph visualization (with datasets from UCINET, GraphML, and other types) • Data extraction from a number of social media platform APIs; refreshed runs based on the same parameters (macros) • Large number of tools of graph analysis • A number of layout algorithms and selections to represent the data visually 13
  • 14. HTTP PAGE NETWORKS ON WIKIPEDIA (IN THIS CASE) • http page links within Wikipedia, not connecting out to the Surface Web • One-directional (outlink) directional graph of the target Wikipedia page • May include article page networks, human page networks, robot page networks, and others • Networks seeded by one target title or name (as long as the string appears as a page in Wikipedia) • No need for an application programming interface (API) on the MediaWiki platform 14
  • 15. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (1 DEG., 237 VERTICES, 237 EDGES) 15
  • 16. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (1.5 DEG., 12,368 VERTICES AND 17,686 UNIQUE EDGES) 16
  • 17. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (2 DEG., 923,006 VERTICES) 17 In the first run, the software kicked up an “out of memory” exception error and crashed. Another run was conducted on a different machine with more processing capability. The screenshots are from that data extraction. The data itself involved some edge pairs (over half a dozen) in which one of the vertices was missing.
  • 18. EXAMPLE: ARTICLE NETWORK • Who are individuals related to a topic? Events? Years? Topics? Which of these may be useful leads to learn more about the basic seed topic? • Based on a real-world individual, what is he or she known for? Who are people that this person is connected with? • Based on a technology, when was it originated? Who originated it? What were precursor inventions? What inventions were linked to the particular technology? 18
  • 19. EXAMPLE: ARTICLE NETWORK (CONT.) • Based on collected lists, who is on a target list, and for what? • Based on a particular topic, are there gaps in the information based on “missing” article links? • Based on a particular phenomena, event, phrase, or individual, in a foreign context and foreign language, what may be learned? 19
  • 20. WIKI ARTICLE NETWORK ON WIKIPEDIA (1 DEG., 162 VERTICES) 20
  • 22. EXAMPLE: HUMAN (AUTHOR / EDITOR) USER NETWORK • Based on the human user’s network on Wikipedia, what articles does he or she tend to edit? In total, what does this network suggest about the person behind the edits? • (This requires the existence of a user page though.) 22
  • 25. EXAMPLE: ROBOT NETWORK • Based on the approved robot user’s network, what are the interests of the maker of the robot? What other accounts is the robot connected to? 25
  • 28. ADDITIONAL APPROACHES • Chaining from one target account to related others • Cross-comparing information on the Wikipedia site with the extracted networks • Connecting the Wikipedia information with related sites on the Surface Web / World Wide Web (WWW) and Internet 28
  • 29. OTHER (FUTURE) NETWORKS FROM WIKIPEDIA • The third-party tool to NodeXL has spaces to enable user-content (two-mode) network extractions and the mapping of co-editing networks…but those functions are not currently enabled (apparently) 29