SlideShare a Scribd company logo
將 Open Data 放上 Open Source Platforms
開源資料入口平台 CKAN 開發經驗分享
@ FOSS and Project Collaboration (Spring 2015)
This work is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
Presenter: 李承錱 Cheng-Jen Lee (Sol)
Email: cjlee AT iis.sinica.edu.tw
2
About Me
●
Sol, @u10313335
●
Institute of Information Science, Academia Sinica
●
https://about.me/SolLee
●
Python / R / Java
●
Focused Areas
– CMS
– Data Repository
– Open Data
– *nix System Administration
3
Agenda
●
Open Data and Open Data Portals
●
About CKAN
●
CKAN and 5 Open Data★
●
Experiences
●
Contribution: What and How?
4
Open Data and Open Data Portals
●
Open Data
– The idea that certain data should be freely available to
everyone to use and republish as they wish, without
restrictions from copyright, patents or other
mechanisms of control1
.
●
Open Data Portals
– Facilitate access to and re-use of public sector
information2
.
– “Infrastruction” of open data
1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data
2. Open Data Portals - Digital Agenda for Europe
http://ec.europa.eu/digital-agenda/en/open-data-portals
5
About CKAN
6
CKAN
●
The Comprehensive Knowledge Archive
Network
●
A powerful data management system
– Publishing
– Sharing
– Finding
– Using Data
7
Screenshot
8
The Most Popular Platform for
Open Data
116 instances
around the world
in March 2015
http://ckan.org/instances
9
The Most Popular Platform for
Open Data
●
Widely used in government data portal
– In EU member states, 30% open data portals adopted
CKAN (OpenDataMonitor1
, March 2015)
●
Workflow support for publishing data
●
Data Visualization
●
100+ Extensions
●
Powerful APIs
●
Open-sourced (AGPLv3)
1. http://www.opendatamonitor.eu
10
United Kingdom
DATA.GOV.UK
11
United States
DATA.GOV
12
Japan
DATA.GO.JP
13
European Union
PUBLICDATA.EU
14
Tainan City
DATA.TAINAN.GOV.TW
15
Nantou County
DATA.NANTOU.GOV.TW
16
Hsinchu City
OPENDATA.HCCG.GOV.TW
17
Taipei City
DATA.TAIPEI
18
台江內海研究資料集
TAIJIANG.TW
19
Demo Site
demo.ckan.org
20
Publish Datasets
① Add Dataset Information
21
Publish Datasets
② Add Data under the Dataset
22
Find Datasets
By Keyword
23
Find Datasets
By Location
24
Find Datasets
By filters
25
Data Preview and Visualization
recline_view (csv, xls)
Grid
26
Data Preview and Visualization
recline_view (csv, xls)
Graph
27
Data Preview and Visualization
recline_view (csv, xls)
Lat/Long fields
28
Data Preview and Visualization
wms_preview
29
Data Preview and Visualization
geojson_preview
30
Data Preview and Visualization
●
Docs: recline_view, text_view, json_view, pdf_view,
webpage_view, officedocs_view...
●
Pics: image_view
●
And more!
31
Authorization
organization
http://opendata.hccg.gov.tw/organization
32
Data Exchange
Harvest and Federation
33
CKAN and 5 Open Data★ 1
1. Tim Berners-Lee, “Linked Data”
http://www.w3.org/DesignIssues/LinkedData.html
34
CKAN and 5 Open Data★
●
★ Make your stuff available on the Web (whatever
format) under an open license
Customizable licenses
35
CKAN and 5 Open Data★
●
★★ Make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★ Use non-proprietary formats (e.g., CSV instead of
Excel)
– Upload any data format
– Data API
●
Get records from
structured data
Data API
36
CKAN and 5 Open Data★
●
★★★★ Use open standards from W3C (RDF and SPARQL) to
identify things, so that people can point at your stuff
●
★★★★★ Link your data to other data to provide context
– Built-in RDF exporting capabilities
– Expose or consume metadata from other catalogs using RDF
(DCAT) docs1
●
ckanext-qa2
: Check the openess of datasets or resources
1. Supported by ckanext-dcat extension
2. https://github.com/ckan/ckanext-qa
37
Experiences
38
System Architecture
39
Installation
●
Official Documents:
– http://docs.ckan.org/en/latest/
●
Installation Notes (In Chinese):
– https://ckan-docs-tw.readthedocs.org/
40
Customizations for Taijiang.tw
●
Custom Metadata
●
Data Visualization
●
Custom filters
●
Harvest
●
Localization
●
Source Code Released under AGPLv3 (On GitHub: u10313335)
– ckanext-taijiang
– ckanext-spatial
– taijiang-ckan-translations
– taijiang-bulk-uploader
41
Custom Metadata
●
Extension ckanext-scheming1
– Configure and share CKAN schemas using a JSON
schema description.
– Custom template snippets for editing and display
fields.
Template Name Function
text.html a simple text field for free-form text
large_text.html a larger text field
date.html a date widget
markdown.html a markdown field
select.html a select box
multiple_choice.html a group of checkboxes
repeating.html a repeating fields
1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+
42
Custom Metadata – Example
{
"field_name": "data_type",
"label": {"en": "Data Type", "zh_TW": " 資料類型 "},
"preset": "select",
"form_attrs": {"data-module": "autocomplete"},
"choices": [{"value": "statistics", "label": Statistics"}]
}
{
"field_name": "ref",
"preset": "repeating_text",
"label": {"en": "Reference", "zh_TW": " 參考來源 "},
"form_blanks": 3
}
select
repeating_text
43
Validator and Converter
●
Ensure data quality
44
Validator and Converter
●
Validator
– Validate user inputs
– Ex. json_validator
def json_validator(value, context):
if value == '':
return value
try:
json.loads(value)
except ValueError:
raise Invalid('Invalid JSON')
return value
45
Validator and Converter
●
Converter
– Convert data to storage
– Ex. duplicate_validator
def duplicate_validator(key, data, errors,
context):
if errors[key]:
return
value = json.loads(data[key])
unduplicated = list(set(value))
data[key] = json.dumps(unduplicated)
46
Data Visualization
●
There is no viewer for some GIS formats
– WMTS services
– ESRI Shapefile (*.shp and *.dbf)
●
Do It Ourselves!
– wmts_view
– shp_view
47
Write a CKAN Plugin
●
PyUtilib Component Architecture (PCA)
●
Inherits from
– ckan.plugins.SingletonPlugin
●
Implements
– one (or several) ckan.plugins.* interfaces
48
To Build a "viewer"
●
We need more…
– View template (Jinja template engine)
– JavaScript module
●
Ex. Shapefile preview includes shp2geojson.js1
.
1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)
49
Example: Plugin for SHP Preview
from ckan import plugins as p
class SHPView(p.SingletonPlugin):
p.implements(p.IResourceView,
inherit=True)
def info(self):
return {'name': shp_view',
'title': 'shp',
'icon': 'map-marker',
'iframed': True,
'default_title': 'SHP',
}
def can_view(self, data_dict):
resource = data_dict['resource']
format_lower = resource['format'].lower()
if format_lower in self.SHP:
return self.same_domain or
self.proxy_is_enabled
return False
def view_template(self, context, data_dict):
<div data-module="shppreview" id="data-
preview" data-module-
map_config="{{ h.dump_json(map_config) }}"
></div>
// shapefile preview module
ckan.module('shppreview', function (jQuery, _)
{
Return {
initialize: function () {
…
}
showPreview: function (url, data) {
…
}
}
}
Python Plugin View Template
(shp.html)
JS Module (shp_view.js)
50
Result
http://taijiang.tw/dataset/tainangis-wmts
wmts_view
51
Result
shp_view QGIS
http://taijiang.tw/dataset/proj4-29
shp_view
52
Custom Filters
●
Find Datasets by
– Time period
– Self-defined categories
●
A New Plugin
– For Time Search
●
Implement IPackageController.before_search
– For Self-defined Categories
●
Implement IPackageController.before_index and
Ifacets.dataset_facets
– Both needs new definitions in solr schema
53
Example: Plugin for Time Search
from ckan import plugins as p
class TaijiangDatasets(p.SingletonPlugin):
p.implements(p.IPackageController, inherit=True)
p.implements(p.IFacets)
def before_search(self, search_params):
…
begin = parse_date(search_params['extras']
['ext_begin_date'])
end = parse_date(search_params['extras']['ext_end_date'])
...
query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO
*]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])")
query = query.format(begin.isoformat(), end.isoformat())
search_params['q'] = query
return search_params
def dataset_facets(self, facets_dict, package_type):
facets_dict['date_facet'] = p.toolkit._('Date of Dataset')
return facets_dict
<dynamicField
name="*_time"
type="date"
indexed="true"
stored="true"
multiValued="false"/>
Python Plugin Solr Schema
54
Result
55
Harvest
●
ckanext-harvest
– Remote harvesting extension
– https://github.com/okfn/ckanext-harvest
●
Source Type
– CKAN
– CSW* (Catalog Service for the Web)
– WAF* (Web Accessible Folder)
– Custom (csv/xls/website… etc.)
*Provided by ckanext-spatial
56
Harvest
Job Dashboard
57
Harvest
Background Process
●
Manually
– (pyenv) $ paster --plugin=ckanext-harvest harvester
gather_consumer/fetch_consumer/run -c
/etc/ckan/default/production.ini
●
Automatically
– Supervisor (for gather & fetch consumer)
– Cron (for run)
58
Harvest
The Harvesting Interface
from base import HarvesterBase
class SRDAHarvester(HarvesterBase):
def _set_config(self,config_str):
def info(self):
...
def gather_stage(self, harvest_job):
…
def fetch_stage(self, harvest_object):
...
def import_stage(self, harvest_object):
...
See http://goo.gl/ZMnND7 for
details.
59
Localization
●
Translation for UI
– Gettext Style i18n
– Babel (*.po & *.mo)
●
In Python
p.toolkit._('String')
●
In Jinja Template
{{ _('String') }}
●
Transifex
Open Knowledge / CKAN
– Jed (For JavaScript Modules)
●
_('String')_
60
Localization
●
Translation for Extensions
– opendatatrentino/ckan-custom-translations (GitHub)
●
Translation for Metadata
– Defined in JSON Schema
– "label": {"en": "Data Type", "zh_TW": " 資料類型 "}
61
Localization
●
Chinese Search
– Solr + mmseg4j1
(A Java Tokenizer)
– Maximum Matching Algorithm2
(By Dr. Chih-Hao Tsai)
– Copy to Solr folder and modify Solr schema
– Ref: http://is.gd/2Vpzgb
1. https://github.com/chenlb/mmseg4j-solr (Released under Apache 2.0
license)
2. http://technology.chtsai.org/mmseg/
62
Contribution: What and How?
63
What to Contribute?
●
CKAN Core Features
– Time and spatial search for private datasets
– Publish datasets as a catalogue service Ex. CSW
– Web interface for bulk uploads
– A simplified deployment process
– Issues on GitHub: https://github.com/ckan/ckan/issues
– More ideas:
https://github.com/ckan/ideas-and-roadmap
64
What to Contribute?
●
i18n
– Non-ascii Filename
– Translate JS Modules (Ex. Recline.js)
– UI Translation (Transifex)
65
What to Contribute?
●
More Functions for Using Data in Web Browser
– Audios & Videos playback (Ex. Integrates plyr.io)
– Link to third party services1
, like Shiny2
(R-based) or
Ipython Notebook (Python-based)
1. http://www.data.gov/meta/open-apps/
2. https://github.com/ckan/ideas-and-roadmap/issues/35
66
What to Contribute?
●
Rebuild data.g0v.tw with CKAN?
● data.g0v.tw ( 零時資料中心 )
– Built with DKAN (A CKAN clone for Drupal)
●
Problems of DKAN
– Development is much slower than CKAN
– Lack of features introduced in latter versions of CKAN
●
Ex. Multiple persistent views of data (In CKAN 2.3)
– Most gov sites in TW use (or will use) CKAN instead of
DKAN
67
How to Contribute?
●
CKAN Core: ckan/ckan (GitHub)
●
Most plugins are also available on GitHub
– http://extensions.ckan.org/
●
Development Discussions (Mailing List)
– https://lists.okfn.org/mailman/listinfo/ckan-dev
●
Contributing Guide
– http://docs.ckan.org/en/latest/contributing/index.html
68
Thanks for your attention!
Any Q?
Email: cjlee AT iis.sinica.edu.tw
Profile: http://about.me/sollee
Google Groups: CKAN Taiwan Interest Group
Ad

More Related Content

What's hot (20)

Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
Madhavendra Dutt
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
Kristian Alexander
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlib
Jens Fisseler, Dr.
 
Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
Sumit Tambe
 
Introduction to oracle(2)
Introduction to oracle(2)Introduction to oracle(2)
Introduction to oracle(2)
Sumit Tambe
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCFIntroduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCF
Globus
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Ryan Blue
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Oracle training-in-hyderabad
Oracle training-in-hyderabadOracle training-in-hyderabad
Oracle training-in-hyderabad
sreehari orienit
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
Databricks
 
Orcale dba training
Orcale dba trainingOrcale dba training
Orcale dba training
united global soft
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
Accumulo Summit
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Ben Busby
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
Kristian Alexander
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlib
Jens Fisseler, Dr.
 
Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
Sumit Tambe
 
Introduction to oracle(2)
Introduction to oracle(2)Introduction to oracle(2)
Introduction to oracle(2)
Sumit Tambe
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCFIntroduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCF
Globus
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Ryan Blue
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Patrick Wendell
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Oracle training-in-hyderabad
Oracle training-in-hyderabadOracle training-in-hyderabad
Oracle training-in-hyderabad
sreehari orienit
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
Databricks
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Ben Busby
 

Viewers also liked (9)

CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
Chengjen Lee
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
Chengjen Lee
 
從open data角度談網站api應用
從open data角度談網站api應用從open data角度談網站api應用
從open data角度談網站api應用
Yu Shu Huang
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
Chengjen Lee
 
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Charles Chuang
 
Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享
Bob Chao
 
大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics
新頁 陳
 
Google analytics教學手冊
Google analytics教學手冊Google analytics教學手冊
Google analytics教學手冊
煜庭 邱
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
Chengjen Lee
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
Chengjen Lee
 
從open data角度談網站api應用
從open data角度談網站api應用從open data角度談網站api應用
從open data角度談網站api應用
Yu Shu Huang
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
Chengjen Lee
 
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Charles Chuang
 
Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享
Bob Chao
 
大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics
新頁 陳
 
Google analytics教學手冊
Google analytics教學手冊Google analytics教學手冊
Google analytics教學手冊
煜庭 邱
 
Ad

Similar to 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享 (20)

Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
QAware GmbH
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
Globus
 
Hot tutorials
Hot tutorialsHot tutorials
Hot tutorials
Kanagaraj M
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
Venkata Reddy Konasani
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
Márton Kodok
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 
query_tuning.pdf
query_tuning.pdfquery_tuning.pdf
query_tuning.pdf
ssuserf99076
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
Virot "Ta" Chiraphadhanakul
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
YOUTH MEDIA AGENCY
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
QAware GmbH
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
Globus
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
Márton Kodok
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
Ad

More from Chengjen Lee (8)

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
Chengjen Lee
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.io
Chengjen Lee
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
Chengjen Lee
 
Report 140227
Report 140227Report 140227
Report 140227
Chengjen Lee
 
Report 140213
Report 140213Report 140213
Report 140213
Chengjen Lee
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
Chengjen Lee
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
Chengjen Lee
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
Chengjen Lee
 
Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
Chengjen Lee
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.io
Chengjen Lee
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
Chengjen Lee
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
Chengjen Lee
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
Chengjen Lee
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
Chengjen Lee
 

Recently uploaded (20)

Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 

將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

  • 1. 將 Open Data 放上 Open Source Platforms 開源資料入口平台 CKAN 開發經驗分享 @ FOSS and Project Collaboration (Spring 2015) This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License. Presenter: 李承錱 Cheng-Jen Lee (Sol) Email: cjlee AT iis.sinica.edu.tw
  • 2. 2 About Me ● Sol, @u10313335 ● Institute of Information Science, Academia Sinica ● https://about.me/SolLee ● Python / R / Java ● Focused Areas – CMS – Data Repository – Open Data – *nix System Administration
  • 3. 3 Agenda ● Open Data and Open Data Portals ● About CKAN ● CKAN and 5 Open Data★ ● Experiences ● Contribution: What and How?
  • 4. 4 Open Data and Open Data Portals ● Open Data – The idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control1 . ● Open Data Portals – Facilitate access to and re-use of public sector information2 . – “Infrastruction” of open data 1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data 2. Open Data Portals - Digital Agenda for Europe http://ec.europa.eu/digital-agenda/en/open-data-portals
  • 6. 6 CKAN ● The Comprehensive Knowledge Archive Network ● A powerful data management system – Publishing – Sharing – Finding – Using Data
  • 8. 8 The Most Popular Platform for Open Data 116 instances around the world in March 2015 http://ckan.org/instances
  • 9. 9 The Most Popular Platform for Open Data ● Widely used in government data portal – In EU member states, 30% open data portals adopted CKAN (OpenDataMonitor1 , March 2015) ● Workflow support for publishing data ● Data Visualization ● 100+ Extensions ● Powerful APIs ● Open-sourced (AGPLv3) 1. http://www.opendatamonitor.eu
  • 20. 20 Publish Datasets ① Add Dataset Information
  • 21. 21 Publish Datasets ② Add Data under the Dataset
  • 25. 25 Data Preview and Visualization recline_view (csv, xls) Grid
  • 26. 26 Data Preview and Visualization recline_view (csv, xls) Graph
  • 27. 27 Data Preview and Visualization recline_view (csv, xls) Lat/Long fields
  • 28. 28 Data Preview and Visualization wms_preview
  • 29. 29 Data Preview and Visualization geojson_preview
  • 30. 30 Data Preview and Visualization ● Docs: recline_view, text_view, json_view, pdf_view, webpage_view, officedocs_view... ● Pics: image_view ● And more!
  • 33. 33 CKAN and 5 Open Data★ 1 1. Tim Berners-Lee, “Linked Data” http://www.w3.org/DesignIssues/LinkedData.html
  • 34. 34 CKAN and 5 Open Data★ ● ★ Make your stuff available on the Web (whatever format) under an open license Customizable licenses
  • 35. 35 CKAN and 5 Open Data★ ● ★★ Make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ Use non-proprietary formats (e.g., CSV instead of Excel) – Upload any data format – Data API ● Get records from structured data Data API
  • 36. 36 CKAN and 5 Open Data★ ● ★★★★ Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ● ★★★★★ Link your data to other data to provide context – Built-in RDF exporting capabilities – Expose or consume metadata from other catalogs using RDF (DCAT) docs1 ● ckanext-qa2 : Check the openess of datasets or resources 1. Supported by ckanext-dcat extension 2. https://github.com/ckan/ckanext-qa
  • 39. 39 Installation ● Official Documents: – http://docs.ckan.org/en/latest/ ● Installation Notes (In Chinese): – https://ckan-docs-tw.readthedocs.org/
  • 40. 40 Customizations for Taijiang.tw ● Custom Metadata ● Data Visualization ● Custom filters ● Harvest ● Localization ● Source Code Released under AGPLv3 (On GitHub: u10313335) – ckanext-taijiang – ckanext-spatial – taijiang-ckan-translations – taijiang-bulk-uploader
  • 41. 41 Custom Metadata ● Extension ckanext-scheming1 – Configure and share CKAN schemas using a JSON schema description. – Custom template snippets for editing and display fields. Template Name Function text.html a simple text field for free-form text large_text.html a larger text field date.html a date widget markdown.html a markdown field select.html a select box multiple_choice.html a group of checkboxes repeating.html a repeating fields 1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+
  • 42. 42 Custom Metadata – Example { "field_name": "data_type", "label": {"en": "Data Type", "zh_TW": " 資料類型 "}, "preset": "select", "form_attrs": {"data-module": "autocomplete"}, "choices": [{"value": "statistics", "label": Statistics"}] } { "field_name": "ref", "preset": "repeating_text", "label": {"en": "Reference", "zh_TW": " 參考來源 "}, "form_blanks": 3 } select repeating_text
  • 44. 44 Validator and Converter ● Validator – Validate user inputs – Ex. json_validator def json_validator(value, context): if value == '': return value try: json.loads(value) except ValueError: raise Invalid('Invalid JSON') return value
  • 45. 45 Validator and Converter ● Converter – Convert data to storage – Ex. duplicate_validator def duplicate_validator(key, data, errors, context): if errors[key]: return value = json.loads(data[key]) unduplicated = list(set(value)) data[key] = json.dumps(unduplicated)
  • 46. 46 Data Visualization ● There is no viewer for some GIS formats – WMTS services – ESRI Shapefile (*.shp and *.dbf) ● Do It Ourselves! – wmts_view – shp_view
  • 47. 47 Write a CKAN Plugin ● PyUtilib Component Architecture (PCA) ● Inherits from – ckan.plugins.SingletonPlugin ● Implements – one (or several) ckan.plugins.* interfaces
  • 48. 48 To Build a "viewer" ● We need more… – View template (Jinja template engine) – JavaScript module ● Ex. Shapefile preview includes shp2geojson.js1 . 1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)
  • 49. 49 Example: Plugin for SHP Preview from ckan import plugins as p class SHPView(p.SingletonPlugin): p.implements(p.IResourceView, inherit=True) def info(self): return {'name': shp_view', 'title': 'shp', 'icon': 'map-marker', 'iframed': True, 'default_title': 'SHP', } def can_view(self, data_dict): resource = data_dict['resource'] format_lower = resource['format'].lower() if format_lower in self.SHP: return self.same_domain or self.proxy_is_enabled return False def view_template(self, context, data_dict): <div data-module="shppreview" id="data- preview" data-module- map_config="{{ h.dump_json(map_config) }}" ></div> // shapefile preview module ckan.module('shppreview', function (jQuery, _) { Return { initialize: function () { … } showPreview: function (url, data) { … } } } Python Plugin View Template (shp.html) JS Module (shp_view.js)
  • 52. 52 Custom Filters ● Find Datasets by – Time period – Self-defined categories ● A New Plugin – For Time Search ● Implement IPackageController.before_search – For Self-defined Categories ● Implement IPackageController.before_index and Ifacets.dataset_facets – Both needs new definitions in solr schema
  • 53. 53 Example: Plugin for Time Search from ckan import plugins as p class TaijiangDatasets(p.SingletonPlugin): p.implements(p.IPackageController, inherit=True) p.implements(p.IFacets) def before_search(self, search_params): … begin = parse_date(search_params['extras'] ['ext_begin_date']) end = parse_date(search_params['extras']['ext_end_date']) ... query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO *]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])") query = query.format(begin.isoformat(), end.isoformat()) search_params['q'] = query return search_params def dataset_facets(self, facets_dict, package_type): facets_dict['date_facet'] = p.toolkit._('Date of Dataset') return facets_dict <dynamicField name="*_time" type="date" indexed="true" stored="true" multiValued="false"/> Python Plugin Solr Schema
  • 55. 55 Harvest ● ckanext-harvest – Remote harvesting extension – https://github.com/okfn/ckanext-harvest ● Source Type – CKAN – CSW* (Catalog Service for the Web) – WAF* (Web Accessible Folder) – Custom (csv/xls/website… etc.) *Provided by ckanext-spatial
  • 57. 57 Harvest Background Process ● Manually – (pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer/fetch_consumer/run -c /etc/ckan/default/production.ini ● Automatically – Supervisor (for gather & fetch consumer) – Cron (for run)
  • 58. 58 Harvest The Harvesting Interface from base import HarvesterBase class SRDAHarvester(HarvesterBase): def _set_config(self,config_str): def info(self): ... def gather_stage(self, harvest_job): … def fetch_stage(self, harvest_object): ... def import_stage(self, harvest_object): ... See http://goo.gl/ZMnND7 for details.
  • 59. 59 Localization ● Translation for UI – Gettext Style i18n – Babel (*.po & *.mo) ● In Python p.toolkit._('String') ● In Jinja Template {{ _('String') }} ● Transifex Open Knowledge / CKAN – Jed (For JavaScript Modules) ● _('String')_
  • 60. 60 Localization ● Translation for Extensions – opendatatrentino/ckan-custom-translations (GitHub) ● Translation for Metadata – Defined in JSON Schema – "label": {"en": "Data Type", "zh_TW": " 資料類型 "}
  • 61. 61 Localization ● Chinese Search – Solr + mmseg4j1 (A Java Tokenizer) – Maximum Matching Algorithm2 (By Dr. Chih-Hao Tsai) – Copy to Solr folder and modify Solr schema – Ref: http://is.gd/2Vpzgb 1. https://github.com/chenlb/mmseg4j-solr (Released under Apache 2.0 license) 2. http://technology.chtsai.org/mmseg/
  • 63. 63 What to Contribute? ● CKAN Core Features – Time and spatial search for private datasets – Publish datasets as a catalogue service Ex. CSW – Web interface for bulk uploads – A simplified deployment process – Issues on GitHub: https://github.com/ckan/ckan/issues – More ideas: https://github.com/ckan/ideas-and-roadmap
  • 64. 64 What to Contribute? ● i18n – Non-ascii Filename – Translate JS Modules (Ex. Recline.js) – UI Translation (Transifex)
  • 65. 65 What to Contribute? ● More Functions for Using Data in Web Browser – Audios & Videos playback (Ex. Integrates plyr.io) – Link to third party services1 , like Shiny2 (R-based) or Ipython Notebook (Python-based) 1. http://www.data.gov/meta/open-apps/ 2. https://github.com/ckan/ideas-and-roadmap/issues/35
  • 66. 66 What to Contribute? ● Rebuild data.g0v.tw with CKAN? ● data.g0v.tw ( 零時資料中心 ) – Built with DKAN (A CKAN clone for Drupal) ● Problems of DKAN – Development is much slower than CKAN – Lack of features introduced in latter versions of CKAN ● Ex. Multiple persistent views of data (In CKAN 2.3) – Most gov sites in TW use (or will use) CKAN instead of DKAN
  • 67. 67 How to Contribute? ● CKAN Core: ckan/ckan (GitHub) ● Most plugins are also available on GitHub – http://extensions.ckan.org/ ● Development Discussions (Mailing List) – https://lists.okfn.org/mailman/listinfo/ckan-dev ● Contributing Guide – http://docs.ckan.org/en/latest/contributing/index.html
  • 68. 68 Thanks for your attention! Any Q? Email: cjlee AT iis.sinica.edu.tw Profile: http://about.me/sollee Google Groups: CKAN Taiwan Interest Group

Editor's Notes

  • #2: 地理資訊圖資雲服務平台
  • #7: 由 Open Knowledge Foundation 支持
  • #8: 公布網址
  • #9: 由 Open Knowledge Foundation 支持
  • #10: 由 Open Knowledge Foundation 支持
  • #28: Store the raw data and metadata. Visualise structured data with interactive tables, graphs and maps.
  • #33: 地理資訊圖資雲服務平台