SlideShare a Scribd company logo
DataFinder: Concepts and Usage German Aerospace Center (DLR), Cologne/Berlin/Braunschweig http://www.dlr.de/sc
Outline Introduction Configuration and customization  Requirements Analysis Installation Configuration Customization Data Migration
DataFinder Introduction Background:   Data Management Problem Absent organizational structures No central data management policy Every employee organizes his/her data individually    Researchers spend about 30% of their time searching for data    Problem with data left behind by temporary staff Increase of data because of growing size and regulations  Rapidly growing volume of simulation and experimental data Legal requirements for long-term availability of data (up to 50 years!) Situation is similar for every DLR institute, many research labs and agencies and even for the industry
DataFinder Introduction Basic Concept Lightweight Client-Server solution Based on  open and stable standards , such as XML and WebDAV Extensible through Python scripts  to fit multiple scenarios
DataFinder Introduction Graphical User Interfaces of DataFinder 1.x User Client Administrator Client Implementation in Python with Qt/PyQt Current Version differs Current Version differs
DataFinder Introduction Data Store Concept  Logical   View User   Client Storage  Locations
DataFinder Configuration and Customization
DataFinder Configuration and Customization Preparing DataFinder for certain “use cases” Requirements Analysis Analyze data, working environment and user workflows Configuration Server and Client setup Define and configure data model Configure distributed storage resources (Data Stores) Customization Write functional extensions with Python scripts  (GUI) Tool integration Data Migration Analyzing current data  Migration of the data into new system
Meta data server Apache and Catacomb (based on the WebDAV Protocol) Apache and mod_dav (xampp) Data server Apache and Catacomb (based on the WebDAV Protocol) Apache and mod_dav (xampp) Administrator and user client Source and precompiled Versions (for WinXP and SUSE64) available DataFinder Configuration and Customization Installation
DataFinder Configuration and Customization Data Model: Mapping of Organizational Data Structures User Object (directory) Object (file) Relation Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II
DataFinder Configuration and Customization Exkurs: Meta Data Describe and annotate data (“files”) and collections (“directories”) Different levels of meta data Required meta data defined by administrator User is free to choose additional ones Different types of meta data String Numbers (float, double, …) Lists Dates User can search in meta data
DataFinder Configuration and Customization Exkurs: Meta Data and the User Impact DataFinder restricts the rights of users! Enforcement of “good behavior” User must comply to organizational standards Data is stored in defined (directory) hierarchy on data server Required meta data must be set prior upload User have certain access rights within hierarchy “ Damn! I’m a great scientist! I want freedom to have  my own directory layout…”
DataFinder Configuration and Customization Customization: Python-Scripting for Extension and Automation Integration of DataFinder with environment User, infrastructure, software, … Extension of DataFinder by Python scripts Actions for resources (i.e., files, directories) User interface extensions Typical automations and customizations  Data migration and data import Start of external application (with downloaded data files) Extraction of meta data from result files Automation of recurring tasks (“workflows”)
DataFinder Configuration and Customization Example: Downloading File and Starting Application # Creating a file “/text.txt” using data store “Data Store”. from  datafinder.gui.user  import  script_api  as  gui_api from  datafinder.script_api.repository  import  setWorkingRepository from  datafinder.script_api.item.item_support  import  createLeaf # Get representation of the current managed repository mr = gui_api.managedRepositoryDescription()  # Get currently selected collection in DataFinder Server-View  if   not  mr  is   None : setWorkingRepository(mr) def  _createLeaf(): properties = dict() properties["____dataformat____"] = "TEXT" properties["____datastorename____"] = "Data Store" … createLeaf("/test.txt", properties) script_api.performWithProgressDialog(_createLeaf)
DataFinder Demo Example Live Demo DataFinder Server structure  Admin client: showing XML file of meta model and in client Admin client: setting up a DataStore for development files  Admin client: loading a script extension User client: loading a script extension User client: making a structure User client: upload of a Experimental file into the store User client: double-click on the file opening it User client: script extension: creating a file
Availability DataFinder core available as Open Source Current stable release: DataFinder 2.0 Simplified BSD License Open Source platforms Launchpad Sourceforge  Freshmeat Windows XP and SLED64 bit precompiled  Become a DataFinder fan on Facebook!
Links DataFinder Web site http://www.dlr.de/datafinder DataFinder Open Source  http://sourceforge.net/projects/datafinder http://launchpad.net/datafinder DataFinder Wiki http://wiki.sistec.dlr.de/DataFinderOpenSource Catacomb – recommended Server http://catacomb.tigris.org

More Related Content

What's hot (20)

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
SQL Server 2012 - Semantic Search
SQL Server 2012 - Semantic SearchSQL Server 2012 - Semantic Search
SQL Server 2012 - Semantic Search
Sperasoft
 
3. ADO.NET
3. ADO.NET3. ADO.NET
3. ADO.NET
Rohit Rao
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Julien Le Dem
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
Dan Sullivan, Ph.D.
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big data
Trieu Nguyen
 
SQL Server Extended Events
SQL Server Extended Events SQL Server Extended Events
SQL Server Extended Events
Stuart Moore
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
For Beginers - ADO.Net
For Beginers - ADO.NetFor Beginers - ADO.Net
For Beginers - ADO.Net
Snehal Harawande
 
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
Stuart Moore
 
contentDM
contentDMcontentDM
contentDM
spacecowboyian
 
Chlorine
ChlorineChlorine
Chlorine
Benoy Antony
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Ado Net
Ado NetAdo Net
Ado Net
Jiten Palaparthi
 
Extensibility of a database api with js
Extensibility of a database api with jsExtensibility of a database api with js
Extensibility of a database api with js
ArangoDB Database
 
Александр Третьяков: "Spring Data JPA and MongoDB"
Александр Третьяков: "Spring Data JPA and MongoDB" Александр Третьяков: "Spring Data JPA and MongoDB"
Александр Третьяков: "Spring Data JPA and MongoDB"
Anna Shymchenko
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearch
inovex GmbH
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
Miguel Pastor
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
SQL Server 2012 - Semantic Search
SQL Server 2012 - Semantic SearchSQL Server 2012 - Semantic Search
SQL Server 2012 - Semantic Search
Sperasoft
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Julien Le Dem
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
Dan Sullivan, Ph.D.
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big data
Trieu Nguyen
 
SQL Server Extended Events
SQL Server Extended Events SQL Server Extended Events
SQL Server Extended Events
Stuart Moore
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...
Stuart Moore
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Extensibility of a database api with js
Extensibility of a database api with jsExtensibility of a database api with js
Extensibility of a database api with js
ArangoDB Database
 
Александр Третьяков: "Spring Data JPA and MongoDB"
Александр Третьяков: "Spring Data JPA and MongoDB" Александр Третьяков: "Spring Data JPA and MongoDB"
Александр Третьяков: "Spring Data JPA and MongoDB"
Anna Shymchenko
 
Real-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit ElasticsearchReal-time Data Analytics mit Elasticsearch
Real-time Data Analytics mit Elasticsearch
inovex GmbH
 

Viewers also liked (17)

Windows Hardware Configuration
Windows Hardware ConfigurationWindows Hardware Configuration
Windows Hardware Configuration
adc666
 
Aerospace Project Management : Non-Technical Requirements Management in the B...
Aerospace Project Management : Non-Technical Requirements Management in the B...Aerospace Project Management : Non-Technical Requirements Management in the B...
Aerospace Project Management : Non-Technical Requirements Management in the B...
PMI-Montréal
 
The Holistic Benefit of a Networked Ecosystem – The Real-World Proof
The Holistic Benefit of a Networked Ecosystem – The Real-World ProofThe Holistic Benefit of a Networked Ecosystem – The Real-World Proof
The Holistic Benefit of a Networked Ecosystem – The Real-World Proof
SAP Ariba
 
Configuration Management for Embedded Systems
Configuration Management for Embedded SystemsConfiguration Management for Embedded Systems
Configuration Management for Embedded Systems
elliando dias
 
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
PMI-Montréal
 
Export Compliance: Keeping You Safe, Solvent + Out of Trouble
Export Compliance: Keeping You Safe, Solvent + Out of TroubleExport Compliance: Keeping You Safe, Solvent + Out of Trouble
Export Compliance: Keeping You Safe, Solvent + Out of Trouble
Kegler Brown Hill + Ritter
 
CV_JOBIN(new)
CV_JOBIN(new)CV_JOBIN(new)
CV_JOBIN(new)
jobin john
 
UTAT UAV PDR 2015.pptx
UTAT UAV PDR 2015.pptxUTAT UAV PDR 2015.pptx
UTAT UAV PDR 2015.pptx
Wenkai Xu
 
AS9100C Most Common NCRs - Preview
AS9100C Most Common NCRs - PreviewAS9100C Most Common NCRs - Preview
AS9100C Most Common NCRs - Preview
SAIGlobalAssurance
 
GE Energy_GAS TURBINE MAINTENANCE COURSE
GE Energy_GAS TURBINE MAINTENANCE COURSEGE Energy_GAS TURBINE MAINTENANCE COURSE
GE Energy_GAS TURBINE MAINTENANCE COURSE
Randhir Shinmarh
 
Ch25 configuration management
Ch25 configuration managementCh25 configuration management
Ch25 configuration management
software-engineering-book
 
Export management ppt
Export management pptExport management ppt
Export management ppt
AMARAYYA
 
Export Procedures and Documents
Export Procedures and DocumentsExport Procedures and Documents
Export Procedures and Documents
We Learn - A Continuous Learning Forum from Welingkar's Distance Learning Program.
 
EXPORT IMPORT
EXPORT IMPORTEXPORT IMPORT
EXPORT IMPORT
Rati Kaul
 
EIA for development projects
EIA for development projectsEIA for development projects
EIA for development projects
Anchal Garg
 
Improve the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov VadimImprove the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov Vadim
SoftServe
 
Import & export presentation
Import & export presentationImport & export presentation
Import & export presentation
Eric Lee
 
Windows Hardware Configuration
Windows Hardware ConfigurationWindows Hardware Configuration
Windows Hardware Configuration
adc666
 
Aerospace Project Management : Non-Technical Requirements Management in the B...
Aerospace Project Management : Non-Technical Requirements Management in the B...Aerospace Project Management : Non-Technical Requirements Management in the B...
Aerospace Project Management : Non-Technical Requirements Management in the B...
PMI-Montréal
 
The Holistic Benefit of a Networked Ecosystem – The Real-World Proof
The Holistic Benefit of a Networked Ecosystem – The Real-World ProofThe Holistic Benefit of a Networked Ecosystem – The Real-World Proof
The Holistic Benefit of a Networked Ecosystem – The Real-World Proof
SAP Ariba
 
Configuration Management for Embedded Systems
Configuration Management for Embedded SystemsConfiguration Management for Embedded Systems
Configuration Management for Embedded Systems
elliando dias
 
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
La valeur ajoutée de la gestion des risques - Pour l'entreprise, le chargé de...
PMI-Montréal
 
Export Compliance: Keeping You Safe, Solvent + Out of Trouble
Export Compliance: Keeping You Safe, Solvent + Out of TroubleExport Compliance: Keeping You Safe, Solvent + Out of Trouble
Export Compliance: Keeping You Safe, Solvent + Out of Trouble
Kegler Brown Hill + Ritter
 
UTAT UAV PDR 2015.pptx
UTAT UAV PDR 2015.pptxUTAT UAV PDR 2015.pptx
UTAT UAV PDR 2015.pptx
Wenkai Xu
 
AS9100C Most Common NCRs - Preview
AS9100C Most Common NCRs - PreviewAS9100C Most Common NCRs - Preview
AS9100C Most Common NCRs - Preview
SAIGlobalAssurance
 
GE Energy_GAS TURBINE MAINTENANCE COURSE
GE Energy_GAS TURBINE MAINTENANCE COURSEGE Energy_GAS TURBINE MAINTENANCE COURSE
GE Energy_GAS TURBINE MAINTENANCE COURSE
Randhir Shinmarh
 
Export management ppt
Export management pptExport management ppt
Export management ppt
AMARAYYA
 
EXPORT IMPORT
EXPORT IMPORTEXPORT IMPORT
EXPORT IMPORT
Rati Kaul
 
EIA for development projects
EIA for development projectsEIA for development projects
EIA for development projects
Anchal Garg
 
Improve the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov VadimImprove the Development Process with DevOps Practices by Fedorov Vadim
Improve the Development Process with DevOps Practices by Fedorov Vadim
SoftServe
 
Import & export presentation
Import & export presentationImport & export presentation
Import & export presentation
Eric Lee
 

Similar to DataFinder concepts and example: General (20100503) (20)

DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
Practical OData
Practical ODataPractical OData
Practical OData
Vagif Abilov
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
Al Sabawi
 
Apache Kite
Apache KiteApache Kite
Apache Kite
Alwin James
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
Henning Bergmeyer
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
TechYugadi IT Solutions & Consulting
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
Andreas Schreiber
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
sureshpaladi12
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
Jim Dowling
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
Labmatrix
LabmatrixLabmatrix
Labmatrix
jwppz
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
DataCite How To: Use the MDS
DataCite How To: Use the MDSDataCite How To: Use the MDS
DataCite How To: Use the MDS
Frauke Ziedorn
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
Day 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analyticsDay 1 - Technical Bootcamp azure synapse analytics
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
Al Sabawi
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
Henning Bergmeyer
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
Andreas Schreiber
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
Jim Dowling
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
Labmatrix
LabmatrixLabmatrix
Labmatrix
jwppz
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
DataCite How To: Use the MDS
DataCite How To: Use the MDSDataCite How To: Use the MDS
DataCite How To: Use the MDS
Frauke Ziedorn
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 

Recently uploaded (20)

Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 

DataFinder concepts and example: General (20100503)

  • 1. DataFinder: Concepts and Usage German Aerospace Center (DLR), Cologne/Berlin/Braunschweig http://www.dlr.de/sc
  • 2. Outline Introduction Configuration and customization Requirements Analysis Installation Configuration Customization Data Migration
  • 3. DataFinder Introduction Background: Data Management Problem Absent organizational structures No central data management policy Every employee organizes his/her data individually  Researchers spend about 30% of their time searching for data  Problem with data left behind by temporary staff Increase of data because of growing size and regulations Rapidly growing volume of simulation and experimental data Legal requirements for long-term availability of data (up to 50 years!) Situation is similar for every DLR institute, many research labs and agencies and even for the industry
  • 4. DataFinder Introduction Basic Concept Lightweight Client-Server solution Based on open and stable standards , such as XML and WebDAV Extensible through Python scripts to fit multiple scenarios
  • 5. DataFinder Introduction Graphical User Interfaces of DataFinder 1.x User Client Administrator Client Implementation in Python with Qt/PyQt Current Version differs Current Version differs
  • 6. DataFinder Introduction Data Store Concept Logical View User Client Storage Locations
  • 8. DataFinder Configuration and Customization Preparing DataFinder for certain “use cases” Requirements Analysis Analyze data, working environment and user workflows Configuration Server and Client setup Define and configure data model Configure distributed storage resources (Data Stores) Customization Write functional extensions with Python scripts (GUI) Tool integration Data Migration Analyzing current data Migration of the data into new system
  • 9. Meta data server Apache and Catacomb (based on the WebDAV Protocol) Apache and mod_dav (xampp) Data server Apache and Catacomb (based on the WebDAV Protocol) Apache and mod_dav (xampp) Administrator and user client Source and precompiled Versions (for WinXP and SUSE64) available DataFinder Configuration and Customization Installation
  • 10. DataFinder Configuration and Customization Data Model: Mapping of Organizational Data Structures User Object (directory) Object (file) Relation Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II
  • 11. DataFinder Configuration and Customization Exkurs: Meta Data Describe and annotate data (“files”) and collections (“directories”) Different levels of meta data Required meta data defined by administrator User is free to choose additional ones Different types of meta data String Numbers (float, double, …) Lists Dates User can search in meta data
  • 12. DataFinder Configuration and Customization Exkurs: Meta Data and the User Impact DataFinder restricts the rights of users! Enforcement of “good behavior” User must comply to organizational standards Data is stored in defined (directory) hierarchy on data server Required meta data must be set prior upload User have certain access rights within hierarchy “ Damn! I’m a great scientist! I want freedom to have my own directory layout…”
  • 13. DataFinder Configuration and Customization Customization: Python-Scripting for Extension and Automation Integration of DataFinder with environment User, infrastructure, software, … Extension of DataFinder by Python scripts Actions for resources (i.e., files, directories) User interface extensions Typical automations and customizations Data migration and data import Start of external application (with downloaded data files) Extraction of meta data from result files Automation of recurring tasks (“workflows”)
  • 14. DataFinder Configuration and Customization Example: Downloading File and Starting Application # Creating a file “/text.txt” using data store “Data Store”. from datafinder.gui.user import script_api as gui_api from datafinder.script_api.repository import setWorkingRepository from datafinder.script_api.item.item_support import createLeaf # Get representation of the current managed repository mr = gui_api.managedRepositoryDescription() # Get currently selected collection in DataFinder Server-View if not mr is None : setWorkingRepository(mr) def _createLeaf(): properties = dict() properties["____dataformat____"] = "TEXT" properties["____datastorename____"] = "Data Store" … createLeaf("/test.txt", properties) script_api.performWithProgressDialog(_createLeaf)
  • 15. DataFinder Demo Example Live Demo DataFinder Server structure Admin client: showing XML file of meta model and in client Admin client: setting up a DataStore for development files Admin client: loading a script extension User client: loading a script extension User client: making a structure User client: upload of a Experimental file into the store User client: double-click on the file opening it User client: script extension: creating a file
  • 16. Availability DataFinder core available as Open Source Current stable release: DataFinder 2.0 Simplified BSD License Open Source platforms Launchpad Sourceforge Freshmeat Windows XP and SLED64 bit precompiled Become a DataFinder fan on Facebook!
  • 17. Links DataFinder Web site http://www.dlr.de/datafinder DataFinder Open Source http://sourceforge.net/projects/datafinder http://launchpad.net/datafinder DataFinder Wiki http://wiki.sistec.dlr.de/DataFinderOpenSource Catacomb – recommended Server http://catacomb.tigris.org

Editor's Notes