SlideShare a Scribd company logo
Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester
Tools for Data Manipulation
UKAD Open RefineWorkshop, Jisc London, 18th March 2016
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2
Workshop Resources
Available from:
http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html
Link to Open Refine and plugins
Link to example data used for workshop
Link to completed Open Refine project from todays
workshop
Open Refine
OpenRefine (formerly Google Refine) is a powerful tool for
working with messy data: cleaning it; transforming it from
one format into another; and extending it with web
services and external data.
Main Uses:
• Explore data
• Clean and transform data
• Reconcile and match data
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4
Installing and running Open Refine
Download from:
http://openrefine.org/download.html
Run and in a web browser go to: http://127.0.0.1:3333/
Select ‘create project’ and browse for Archives Hub
example csv data file
Note: May need to clear browser cache to see new projects
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5
Clean andTransform - Facets and Clustering
Strip white space
Transform Upper case, title case
Split multi valued cells or Edit col > Split several cols
Facet on label
Order by count
Cluster and rename rows
Undo
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6
Clean - Remove Duplicate rows
Sort on column with duplicates and reorder permanently
Facet duplicates to check
Watch for OR switching from rows to records view
Edit cells > Blank Down
Facet by blank
Remove all matching
Essence of Open Refine is using facets and filters to isolate
rows and invoke commands to affect all these rows together
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
URIs
LD Design Issues
Triples
http://www.w3.org/DesignIssues/LinkedData.html
8
Triples
Triples statements
»‘Things’ have ‘properties’ with ‘values’
»Subject – Predicate - Object
Archival
Resource
Repository Provides Access To
Pride and
Prejudice
Jane Austen Is Author Of
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9
Triples are the basis of RDF and Linked Data
owl:sameAs
Hub Person - owl:sameAs -VIAF Person
<http://data.archiveshub.ac.uk/id/person/nra/webbma
rthabeatrice1858-1943socialreformer>
owl:sameAs
<http://viaf.org/viaf/86607236> .
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11
Matching Names toVIAF
May need to join columns together, for example to give more
consistent name form, e.g using:
cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " +
cells["Dates"].value
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12
Matching Names toVIAF
VIAF reconciliation service details at:
http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html
May need to add as a ‘standard service’ under Reconcile >
Start reconciling. Service URL is:
http://iphylo.org/~rpage/phyloinformatics/services/reconcil
iation_viaf.php
Other recon services e.g. LCSH at:
https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-
Sources
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13
RDF Export
Download RDF Refine Extension from http://refine.deri.ie/
Unzip
Open Project > Browse workspace directory
Create ‘extensions’ folder (if doesn’t exist)
Copy RDF Refine unzipped folder to workspace directory
Restart Open Refine
Need to create column withVIAF URIs for export:
"http://viaf.org/viaf/"+cell.recon.match.id
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14
Matching Subjects to LCSH
Click RDF button in the top right corner, select ‘Add reconciliation
service, Based on SPARQL endpoint’.
Add following parameters:
Name: LCSH
Endpoint URL: http://sparql.freeyourmetadata.org/
Graph URI: http://id.loc.gov/authorities/subjects
Type:Virtuoso
Label properties: check only skos:prefLabel
Martha BeatriceWebb
Place of birth:Gloucester,
England
Place of death: Liphook,
Hampshire, England
Life dates: 1858-1943
Epithet: social reformer
and historian
Family name:Webb
Image
from: BeatriceWebb letters
BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer,
historian, diarist.Wife, collaborator and assistant of SidneyWebb,
later Lord Passfield.Together they contributed to the radical
ideology first of the Liberal Party and later of the Labour Party.
from: BeatriceWebb,A summer holiday in Scotland, 1884.
BeatriceWebb (1858-1943), nee Potter, social reformer and diarist.
Married to SidneyWebb, pioneers of social science. She was
involved in many spheres of political and social activity including the
Labour Party, Fabianism, social observation, investigations into
poverty, development of socialism, the foundation of the National
Health Service and post war welfare state, the London School of
Biographical Notes
Works
Our Partnership
My Apprenticeship
The case for the factory acts
BeatriceWebb’s diaries; edited by MargaretCole
The Diary
Knows
http://dbpedia.org/page/George_Bernard_Shaw
http://dbpedia.org/page/Sidney_Webb,_1st_Bar
on_Passfield
15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
Contact
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16
Adrian Stevenson
SeniorTechnical Coordinator
Jisc Manchester
http://www.jisc.ac.uk
adrian.stevenson@jisc.ac.uk
http://www.twitter.com/adrianstevenson
https://www.linkedin.com/in/adrianstevenson
Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17
CC License
This presentation available under creative commons Non
Commercial-Share Alike:
http://creativecommons.org/licenses/by-nc/2.0/uk/
Ad

Recommended

Linking Data with sameAs: Challenges and Solutions - Workshop
Linking Data with sameAs: Challenges and Solutions - Workshop
Adrian Stevenson
 
Libraries and Linked open Data
Libraries and Linked open Data
Johannes Keizer
 
Linked Data : Cataloguing and a World Wide Web of Data
Linked Data : Cataloguing and a World Wide Web of Data
Thomas Meehan
 
Educon: History, History
Educon: History, History
visiblehistory
 
Educon2.3 History, history
Educon2.3 History, history
visiblehistory
 
Linked Data vs. APIs, presentation at EmTACL 2012
Linked Data vs. APIs, presentation at EmTACL 2012
Jane Stevenson
 
Rijpma et al. 2019 - record linkage in the cape of good hope panel
Rijpma et al. 2019 - record linkage in the cape of good hope panel
Bram van den Hout
 
Hack a LOD Schalk Clariah WP4
Hack a LOD Schalk Clariah WP4
Ruben Schalk
 
Open Data
Open Data
datable_be
 
ESDG seminar 2019: reconstructing a country
ESDG seminar 2019: reconstructing a country
Rick Mourits
 
It's the end of the world as we know it, and i feel fine
It's the end of the world as we know it, and i feel fine
Martin Hamilton
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
Adrian Stevenson
 
2011 11 grdi-presentation
2011 11 grdi-presentation
Johannes Keizer
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library
Teresa Doherty
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data stories
Ruben Schalk
 
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Peter Broadwell
 
CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
Irina Bolychevsky
 
Esshc presentation ashkan
Esshc presentation ashkan
Bram van den Hout
 
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Allen Press
 
Linked open data and libraries
Linked open data and libraries
Alison Hitchens
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
Alison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
Alison Hitchens
 
Internal meeting: An introduction to the civil registry & LINKS
Internal meeting: An introduction to the civil registry & LINKS
Rick Mourits
 
Cultural Heritage Information Dashboards
Cultural Heritage Information Dashboards
Richard Urban
 
The Past's Present Future: Emerging Trends in Online Cultural Heritage
The Past's Present Future: Emerging Trends in Online Cultural Heritage
Richard Urban
 
Exploring British Design
Exploring British Design
Adrian Stevenson
 
Promotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your library
Guus van den Brekel
 
The Cutting Edge of SWORD
The Cutting Edge of SWORD
Adrian Stevenson
 
Linked Data and the Semantic Web: What Are They and Should I Care?
Linked Data and the Semantic Web: What Are They and Should I Care?
Adrian Stevenson
 

More Related Content

What's hot (18)

Open Data
Open Data
datable_be
 
ESDG seminar 2019: reconstructing a country
ESDG seminar 2019: reconstructing a country
Rick Mourits
 
It's the end of the world as we know it, and i feel fine
It's the end of the world as we know it, and i feel fine
Martin Hamilton
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
Adrian Stevenson
 
2011 11 grdi-presentation
2011 11 grdi-presentation
Johannes Keizer
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library
Teresa Doherty
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data stories
Ruben Schalk
 
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Peter Broadwell
 
CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
Irina Bolychevsky
 
Esshc presentation ashkan
Esshc presentation ashkan
Bram van den Hout
 
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Allen Press
 
Linked open data and libraries
Linked open data and libraries
Alison Hitchens
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
Alison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
Alison Hitchens
 
Internal meeting: An introduction to the civil registry & LINKS
Internal meeting: An introduction to the civil registry & LINKS
Rick Mourits
 
Cultural Heritage Information Dashboards
Cultural Heritage Information Dashboards
Richard Urban
 
The Past's Present Future: Emerging Trends in Online Cultural Heritage
The Past's Present Future: Emerging Trends in Online Cultural Heritage
Richard Urban
 
ESDG seminar 2019: reconstructing a country
ESDG seminar 2019: reconstructing a country
Rick Mourits
 
It's the end of the world as we know it, and i feel fine
It's the end of the world as we know it, and i feel fine
Martin Hamilton
 
The Winner Takes it All? -APIs and Linked Data Battle It Out
The Winner Takes it All? -APIs and Linked Data Battle It Out
Adrian Stevenson
 
2011 11 grdi-presentation
2011 11 grdi-presentation
Johannes Keizer
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library
Teresa Doherty
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data stories
Ruben Schalk
 
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Peter Broadwell
 
CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
Irina Bolychevsky
 
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest: The Road to Open Access - An Aggregator Journey (LundOnline 2014)
ProQuest
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Allen Press
 
Linked open data and libraries
Linked open data and libraries
Alison Hitchens
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
Alison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
Alison Hitchens
 
Internal meeting: An introduction to the civil registry & LINKS
Internal meeting: An introduction to the civil registry & LINKS
Rick Mourits
 
Cultural Heritage Information Dashboards
Cultural Heritage Information Dashboards
Richard Urban
 
The Past's Present Future: Emerging Trends in Online Cultural Heritage
The Past's Present Future: Emerging Trends in Online Cultural Heritage
Richard Urban
 

Viewers also liked (18)

Exploring British Design
Exploring British Design
Adrian Stevenson
 
Promotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your library
Guus van den Brekel
 
The Cutting Edge of SWORD
The Cutting Edge of SWORD
Adrian Stevenson
 
Linked Data and the Semantic Web: What Are They and Should I Care?
Linked Data and the Semantic Web: What Are They and Should I Care?
Adrian Stevenson
 
Linked Data and the Semantic Web - What Are They and Should I Care?
Linked Data and the Semantic Web - What Are They and Should I Care?
Adrian Stevenson
 
High and Lows of Library Linked Data
High and Lows of Library Linked Data
Adrian Stevenson
 
Very Gentle Linked Data Workshop
Very Gentle Linked Data Workshop
Adrian Stevenson
 
Clearspace Demonstration
Clearspace Demonstration
Adrian Stevenson
 
The Story of How an Oracle Classic Stronghold successfully embraced SOA
The Story of How an Oracle Classic Stronghold successfully embraced SOA
Lucas Jellema
 
Visualization - how one picture beats a 1000 words - and how to leverage that
Visualization - how one picture beats a 1000 words - and how to leverage that
Lucas Jellema
 
Inheritance
Inheritance
guest151171
 
Locah Project Show and Tell
Locah Project Show and Tell
Adrian Stevenson
 
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Adrian Stevenson
 
Use Cases Vs User Stories
Use Cases Vs User Stories
Gennady Borukhovich
 
Data manipulation instructions
Data manipulation instructions
Mahesh Kumar Attri
 
31 Case Studies on Conversion Optimization
31 Case Studies on Conversion Optimization
Kissmetrics on SlideShare
 
From Use case to User Story
From Use case to User Story
Kunta Hutabarat
 
Data transfer and manipulation
Data transfer and manipulation
Sanjeev Patel
 
Promotion of Scientific Output : made possible by your library
Promotion of Scientific Output : made possible by your library
Guus van den Brekel
 
Linked Data and the Semantic Web: What Are They and Should I Care?
Linked Data and the Semantic Web: What Are They and Should I Care?
Adrian Stevenson
 
Linked Data and the Semantic Web - What Are They and Should I Care?
Linked Data and the Semantic Web - What Are They and Should I Care?
Adrian Stevenson
 
High and Lows of Library Linked Data
High and Lows of Library Linked Data
Adrian Stevenson
 
Very Gentle Linked Data Workshop
Very Gentle Linked Data Workshop
Adrian Stevenson
 
The Story of How an Oracle Classic Stronghold successfully embraced SOA
The Story of How an Oracle Classic Stronghold successfully embraced SOA
Lucas Jellema
 
Visualization - how one picture beats a 1000 words - and how to leverage that
Visualization - how one picture beats a 1000 words - and how to leverage that
Lucas Jellema
 
Locah Project Show and Tell
Locah Project Show and Tell
Adrian Stevenson
 
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Lessons from ‘Linking Lives’ and ‘WW1 Discovery’ Projects
Adrian Stevenson
 
Data manipulation instructions
Data manipulation instructions
Mahesh Kumar Attri
 
From Use case to User Story
From Use case to User Story
Kunta Hutabarat
 
Data transfer and manipulation
Data transfer and manipulation
Sanjeev Patel
 
Ad

Similar to Tools for Data Manipulation - UKAD Open Refine Workshop (8)

Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014
Jane Stevenson
 
Data Wrangling with Open Refine
Data Wrangling with Open Refine
LOUIS Libraries
 
"Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Ste...
"Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Ste...
Biblioteca Nacional de España
 
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
Adrian Stevenson
 
TXDHC OpenRefine Training
TXDHC OpenRefine Training
Liz Grumbach
 
Linked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for Archives
Adrian Stevenson
 
OpenRefine
OpenRefine
Georgia Libraries Conference (formerly Ga COMO).
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
Adrian Stevenson
 
Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014
Jane Stevenson
 
Data Wrangling with Open Refine
Data Wrangling with Open Refine
LOUIS Libraries
 
"Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Ste...
"Il n´y a pas de hors-texte": challenges for Archival Linked Data. Adrian Ste...
Biblioteca Nacional de España
 
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
“Il n’y a pas de hors-texte” - Challenges for Archival Linked Data
Adrian Stevenson
 
TXDHC OpenRefine Training
TXDHC OpenRefine Training
Liz Grumbach
 
Linked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for Archives
Adrian Stevenson
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
Adrian Stevenson
 
Ad

More from Adrian Stevenson (18)

SEO Matters
SEO Matters
Adrian Stevenson
 
Wrapping and Unwrapping History: What’s Gained and What’s Lost
Wrapping and Unwrapping History: What’s Gained and What’s Lost
Adrian Stevenson
 
Digital Humanities and the First World War
Digital Humanities and the First World War
Adrian Stevenson
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson
 
GLAM Rocks! London Semantic Web Meetup
GLAM Rocks! London Semantic Web Meetup
Adrian Stevenson
 
Linked Data - the Future for Open Repositories. Kultivate Workshop
Linked Data - the Future for Open Repositories. Kultivate Workshop
Adrian Stevenson
 
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
Adrian Stevenson
 
Report on the International Linked Open Data for Libraries, Archives and Muse...
Report on the International Linked Open Data for Libraries, Archives and Muse...
Adrian Stevenson
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
Adrian Stevenson
 
LOCAH Project and Considerations of Linked Data Approaches
LOCAH Project and Considerations of Linked Data Approaches
Adrian Stevenson
 
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Adrian Stevenson
 
RDFa From Theory to Practice
RDFa From Theory to Practice
Adrian Stevenson
 
Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas Seminar
Adrian Stevenson
 
Semantic Technologies: Which Way Now? – UKOLN Response
Semantic Technologies: Which Way Now? – UKOLN Response
Adrian Stevenson
 
SWORD 3 Kick-off Meeting
SWORD 3 Kick-off Meeting
Adrian Stevenson
 
Making Repository Easier With SWORD
Making Repository Easier With SWORD
Adrian Stevenson
 
SWORD: The Story So Far
SWORD: The Story So Far
Adrian Stevenson
 
SWORD: An Overview
SWORD: An Overview
Adrian Stevenson
 
Wrapping and Unwrapping History: What’s Gained and What’s Lost
Wrapping and Unwrapping History: What’s Gained and What’s Lost
Adrian Stevenson
 
Digital Humanities and the First World War
Digital Humanities and the First World War
Adrian Stevenson
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson
 
GLAM Rocks! London Semantic Web Meetup
GLAM Rocks! London Semantic Web Meetup
Adrian Stevenson
 
Linked Data - the Future for Open Repositories. Kultivate Workshop
Linked Data - the Future for Open Repositories. Kultivate Workshop
Adrian Stevenson
 
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
2 minutes on LOCAH Linking Lives at Europeana Tech 2011
Adrian Stevenson
 
Report on the International Linked Open Data for Libraries, Archives and Muse...
Report on the International Linked Open Data for Libraries, Archives and Muse...
Adrian Stevenson
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
Adrian Stevenson
 
LOCAH Project and Considerations of Linked Data Approaches
LOCAH Project and Considerations of Linked Data Approaches
Adrian Stevenson
 
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
Adrian Stevenson
 
RDFa From Theory to Practice
RDFa From Theory to Practice
Adrian Stevenson
 
Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas Seminar
Adrian Stevenson
 
Semantic Technologies: Which Way Now? – UKOLN Response
Semantic Technologies: Which Way Now? – UKOLN Response
Adrian Stevenson
 
Making Repository Easier With SWORD
Making Repository Easier With SWORD
Adrian Stevenson
 

Recently uploaded (20)

“THE BEST CLASS IN SCHOOL”. _
“THE BEST CLASS IN SCHOOL”. _
Colégio Santa Teresinha
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Rajdeep Bavaliya
 
ICT-8-Module-REVISED-K-10-CURRICULUM.pdf
ICT-8-Module-REVISED-K-10-CURRICULUM.pdf
penafloridaarlyn
 
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Restu Bias Primandhika
 
Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
FIRST DAY HIGH orientation for mapeh subject in grade 10.pptx
FIRST DAY HIGH orientation for mapeh subject in grade 10.pptx
GlysdiEelesor1
 
What are the benefits that dance brings?
What are the benefits that dance brings?
memi27
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
 
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
Chalukyas of Gujrat, Solanki Dynasty NEP.pptx
Chalukyas of Gujrat, Solanki Dynasty NEP.pptx
Dr. Ravi Shankar Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
Introduction to problem solving Techniques
Introduction to problem solving Techniques
merlinjohnsy
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
Sustainable Innovation with Immersive Learning
Sustainable Innovation with Immersive Learning
Leonel Morgado
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Rajdeep Bavaliya
 
ICT-8-Module-REVISED-K-10-CURRICULUM.pdf
ICT-8-Module-REVISED-K-10-CURRICULUM.pdf
penafloridaarlyn
 
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Restu Bias Primandhika
 
Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
FIRST DAY HIGH orientation for mapeh subject in grade 10.pptx
FIRST DAY HIGH orientation for mapeh subject in grade 10.pptx
GlysdiEelesor1
 
What are the benefits that dance brings?
What are the benefits that dance brings?
memi27
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
How to Implement Least Package Removal Strategy in Odoo 18 Inventory
Celine George
 
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
Introduction to problem solving Techniques
Introduction to problem solving Techniques
merlinjohnsy
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
Sustainable Innovation with Immersive Learning
Sustainable Innovation with Immersive Learning
Leonel Morgado
 

Tools for Data Manipulation - UKAD Open Refine Workshop

  • 1. Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester Tools for Data Manipulation UKAD Open RefineWorkshop, Jisc London, 18th March 2016
  • 2. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2 Workshop Resources Available from: http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html Link to Open Refine and plugins Link to example data used for workshop Link to completed Open Refine project from todays workshop
  • 3. Open Refine OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. Main Uses: • Explore data • Clean and transform data • Reconcile and match data Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
  • 4. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4 Installing and running Open Refine Download from: http://openrefine.org/download.html Run and in a web browser go to: http://127.0.0.1:3333/ Select ‘create project’ and browse for Archives Hub example csv data file Note: May need to clear browser cache to see new projects
  • 5. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5 Clean andTransform - Facets and Clustering Strip white space Transform Upper case, title case Split multi valued cells or Edit col > Split several cols Facet on label Order by count Cluster and rename rows Undo
  • 6. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6 Clean - Remove Duplicate rows Sort on column with duplicates and reorder permanently Facet duplicates to check Watch for OR switching from rows to records view Edit cells > Blank Down Facet by blank Remove all matching Essence of Open Refine is using facets and filters to isolate rows and invoke commands to affect all these rows together
  • 7. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
  • 9. Triples Triples statements »‘Things’ have ‘properties’ with ‘values’ »Subject – Predicate - Object Archival Resource Repository Provides Access To Pride and Prejudice Jane Austen Is Author Of Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9 Triples are the basis of RDF and Linked Data
  • 10. owl:sameAs Hub Person - owl:sameAs -VIAF Person <http://data.archiveshub.ac.uk/id/person/nra/webbma rthabeatrice1858-1943socialreformer> owl:sameAs <http://viaf.org/viaf/86607236> . Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
  • 11. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11 Matching Names toVIAF May need to join columns together, for example to give more consistent name form, e.g using: cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " + cells["Dates"].value
  • 12. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12 Matching Names toVIAF VIAF reconciliation service details at: http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html May need to add as a ‘standard service’ under Reconcile > Start reconciling. Service URL is: http://iphylo.org/~rpage/phyloinformatics/services/reconcil iation_viaf.php Other recon services e.g. LCSH at: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data- Sources
  • 13. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13 RDF Export Download RDF Refine Extension from http://refine.deri.ie/ Unzip Open Project > Browse workspace directory Create ‘extensions’ folder (if doesn’t exist) Copy RDF Refine unzipped folder to workspace directory Restart Open Refine Need to create column withVIAF URIs for export: "http://viaf.org/viaf/"+cell.recon.match.id
  • 14. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14 Matching Subjects to LCSH Click RDF button in the top right corner, select ‘Add reconciliation service, Based on SPARQL endpoint’. Add following parameters: Name: LCSH Endpoint URL: http://sparql.freeyourmetadata.org/ Graph URI: http://id.loc.gov/authorities/subjects Type:Virtuoso Label properties: check only skos:prefLabel
  • 15. Martha BeatriceWebb Place of birth:Gloucester, England Place of death: Liphook, Hampshire, England Life dates: 1858-1943 Epithet: social reformer and historian Family name:Webb Image from: BeatriceWebb letters BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer, historian, diarist.Wife, collaborator and assistant of SidneyWebb, later Lord Passfield.Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party. from: BeatriceWebb,A summer holiday in Scotland, 1884. BeatriceWebb (1858-1943), nee Potter, social reformer and diarist. Married to SidneyWebb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of Biographical Notes Works Our Partnership My Apprenticeship The case for the factory acts BeatriceWebb’s diaries; edited by MargaretCole The Diary Knows http://dbpedia.org/page/George_Bernard_Shaw http://dbpedia.org/page/Sidney_Webb,_1st_Bar on_Passfield 15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
  • 16. Contact Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16 Adrian Stevenson SeniorTechnical Coordinator Jisc Manchester http://www.jisc.ac.uk [email protected] http://www.twitter.com/adrianstevenson https://www.linkedin.com/in/adrianstevenson
  • 17. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17 CC License This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Editor's Notes

  • #4: Hub used mainly for linked data project where we wanted to match to VIAF. Will come to later in the workshop.
  • #5: Review options on import screen Talk through the example data and the purpose of the columns
  • #6: Facet
  • #7: Mention that facet on duplicates for person URI doesn’t necc mean want to remove the rows as the Arc Res URIs may be different. Depends what wanting to do. More tutorials http://kb.refinepro.com/2011/08/remove-duplicate.html http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial#Deduplicate_entries
  • #8: Explain why might want to reconcile to VIAF. Other recon services at https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources
  • #9: http://www.w3.org/DesignIssues/LinkedData.html
  • #12: If any of cells in the columns are blank, the merge will fail for that row. To fix, create a facet of blank cells with "Text Facet" ⇒ "Customized Facets" ⇒ "Facet by Blank". Then use "Edit Cells" ⇒ "Transform ..." and enter a string with a space: ' '. This also has it’s limitations as some names have inconsistent number of commas.
  • #13: Talk through faceting of judgement. How check and accept reconclied rows. Explain why this is why have included Hub URI and ArcRes URI for manual checking
  • #16: Mock-up of the LInking Lives interface shows the way data is brought together.