SlideShare a Scribd company logo
eResearch workflows for studying free and open source software development James Howison, Andrea Wiggins, & Kevin Crowston Syracuse University School of Information Studies 9 September 2008 ~ IFIP 2.13 - OSS 2008
eResearch Scientific practices and technologies permitting distributed research collaborations using: Large data sets Computational resources Analysis tools and workflows Replicable research with provenance metadata FLOSS research community is starting to move in this direction
Current FLOSS Research Practices Data increasingly available in “repositories of repositories” FLOSSmole FLOSSmetrics SRDA (Notre Dame) Not much sharing of analysis or methods for calculating measures; mostly bespoke scripts
FLOSS Research Repositories Java projects plus user contributions Pilot: KDE Partial: Sourceforge, ObjectWeb, Apache, GNOME Sourceforge Sourceforge, Rubyforge, ObjectWeb, Debian, freshmeat Sample collected In Debian Downloads & pageviews Downloads, pageviews, ratings, in Debian, partial: actual use Use & popularity Size & complexity metrics Planned/pilot: size & complexity metrics SVN/CVS full, size, packages, releases & dates SVN/CVS count, packages, releases & dates SVN/CVS count, packages, releases & dates Software venues Planned/pilot: mail lists & trackers Forums & trackers Releases; in progress: mail lists, forums, trackers Communication venues Memberships & roles Memberships & roles Memberships & roles Developer demographics Basic data & confirmed locations Basic data Basic data Project Demographics Source kibitzer Qualoss & SQO-OSS FLOSSmetrics & CVSAnalY SRDA  (Notre Dame) FLOSSmole
FLOSS Research Today
FLOSS Research Tomorrow?
Scientific Workflow Tools Tools for scientific analysis (e.g. Kepler, Taverna) Self-documenting analysis Analysis conditions recorded at time of execution Steps in workflow executed by components with multiple input and output ports Components linked by joining input and output ports Supports modular analysis development, associated with easier collaboration and higher quality products Represented as flow diagram, stored and shared as single XML file
Why Workflows Instead of Scripts? Differences and advantages over scripting languages Wider accessibility Programming skills helpful but not necessary to start Compatibility Easier integration of heterogeneous components Mash up and reuse existing scripts Standardized metadata Out-of-the-box operation Requires workflow software to execute, but little other configuration is required Example: BioPerl evolution from Perl scripts to libraries to workflow suites
Taverna Workbench Open source stand-alone desktop application in Java Can also be run via “headless” server application Two main interface modes (v 1.71) Design mode: workflow definition, automatically rendered diagram, available component types Execution mode: process monitor, intermediate values, results, XML reports of status and all values Local and remote components are connected through input/output ports in MIME typing Can also be grouped into subworkflows
Taverna Design Mode
Taverna Execution Mode
Workflow Component Types Abstract & notification processors String constants: file locations, threshold values, etc. Beanshell: simplified Java Rshell: R statistical program running through Rserve Java widgets/shims for common operations i/o, lists, text manipulation, JDBC, etc. Command-line Web services WSDL with SOAP Can be “scavenged” from URLs or other workflows
Example Workflow Teal: inputs Light blue: outputs Other light blue: string constant Green: web services Purple: Java shims Yellow: RShell Calculates weighted network centralization in dynamic networks, generates both numeric & graphical output
Example Output Inputs Sliding window size Project venue Date range Outputs Time series graph of centralizations with summary stats CSV of dates and centralization values for additional analysis
Benefits of Using Workflows Modular design yields benefits in flexibility, transparency, and ease of reuse Easier to co-develop and debug components, and to integrate independent efforts Can quickly change strategies with minimal adjustment to existing workflow structure Can reuse existing scripts and workflow components Can also conduct exhaustive sensitivity testing Multiple ways to achieve analysis tasks
Conclusion Combination of growing large-scale data sets and workflow tools present great opportunity for eResearch on FLOSS Work needed for eResearch infrastructure: Access to data Ontologies for naming data and defining relationships Incorporating metadata & social science data, e.g. content analysis schemes Trade-offs involved in standardizing on tools to benefit collaboration, but much potential gain
More Taverna demo screencast Long version (24 minutes):  floss.syr.edu/Presentations/tavernaDemoScreencast.mov Short version (14 minutes):  floss.syr.edu/Presentations/TavernaDemoRedux.m4v MyExperiment FLOSS group www.myexperiment.org/groups/64 16:30 - 17:30 presentation today: Social Dynamics of FLOSS Team Communication across Channels Tomorrow: Workshop on Public Data about Software Development (WoPDaSD 2008)
Ad

More Related Content

Similar to eResearch workflows for studying free and open source software development (20)

Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
Andrea Wiggins
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Baden Hughes
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Startup Club
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
anpawlik
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
kaveirious
 
Versioning for Workflow Evolution
Versioning for Workflow EvolutionVersioning for Workflow Evolution
Versioning for Workflow Evolution
Eran Chinthaka Withana
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
Olabode Ajayi
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hoodSilicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Manish Pandit
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
sparkflows
 
Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
Steve Lange
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
University of Moratuwa
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
myGrid team
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
Cameron Neylon
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Python_a_programming_language_for_Students.pdf
Python_a_programming_language_for_Students.pdfPython_a_programming_language_for_Students.pdf
Python_a_programming_language_for_Students.pdf
muralikrishnarangu
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
Andrea Wiggins
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Baden Hughes
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Startup Club
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
anpawlik
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
kaveirious
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hoodSilicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Silicon Valley Code Camp 2010: Social Platforms : What goes on under the hood
Manish Pandit
 
Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
Steve Lange
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
myGrid team
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Python_a_programming_language_for_Students.pdf
Python_a_programming_language_for_Students.pdfPython_a_programming_language_for_Students.pdf
Python_a_programming_language_for_Students.pdf
muralikrishnarangu
 

More from Andrea Wiggins (20)

Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
Andrea Wiggins
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Andrea Wiggins
 
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
Andrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
Andrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
Andrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
Andrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
Andrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Andrea Wiggins
 
Mechanisms for Data Quality and Validation in Citizen Science
Mechanisms for Data Quality and Validation in Citizen ScienceMechanisms for Data Quality and Validation in Citizen Science
Mechanisms for Data Quality and Validation in Citizen Science
Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
Andrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
Andrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
Andrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Andrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
Andrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
Andrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
Andrea Wiggins
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
Andrea Wiggins
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
Andrea Wiggins
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
Andrea Wiggins
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Andrea Wiggins
 
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
Andrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
Andrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
Andrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
Andrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
Andrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Andrea Wiggins
 
Mechanisms for Data Quality and Validation in Citizen Science
Mechanisms for Data Quality and Validation in Citizen ScienceMechanisms for Data Quality and Validation in Citizen Science
Mechanisms for Data Quality and Validation in Citizen Science
Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
Andrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
Andrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
Andrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Andrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
Andrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
Andrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
Andrea Wiggins
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
Andrea Wiggins
 
Ad

Recently uploaded (20)

AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Ad

eResearch workflows for studying free and open source software development

  • 1. eResearch workflows for studying free and open source software development James Howison, Andrea Wiggins, & Kevin Crowston Syracuse University School of Information Studies 9 September 2008 ~ IFIP 2.13 - OSS 2008
  • 2. eResearch Scientific practices and technologies permitting distributed research collaborations using: Large data sets Computational resources Analysis tools and workflows Replicable research with provenance metadata FLOSS research community is starting to move in this direction
  • 3. Current FLOSS Research Practices Data increasingly available in “repositories of repositories” FLOSSmole FLOSSmetrics SRDA (Notre Dame) Not much sharing of analysis or methods for calculating measures; mostly bespoke scripts
  • 4. FLOSS Research Repositories Java projects plus user contributions Pilot: KDE Partial: Sourceforge, ObjectWeb, Apache, GNOME Sourceforge Sourceforge, Rubyforge, ObjectWeb, Debian, freshmeat Sample collected In Debian Downloads & pageviews Downloads, pageviews, ratings, in Debian, partial: actual use Use & popularity Size & complexity metrics Planned/pilot: size & complexity metrics SVN/CVS full, size, packages, releases & dates SVN/CVS count, packages, releases & dates SVN/CVS count, packages, releases & dates Software venues Planned/pilot: mail lists & trackers Forums & trackers Releases; in progress: mail lists, forums, trackers Communication venues Memberships & roles Memberships & roles Memberships & roles Developer demographics Basic data & confirmed locations Basic data Basic data Project Demographics Source kibitzer Qualoss & SQO-OSS FLOSSmetrics & CVSAnalY SRDA (Notre Dame) FLOSSmole
  • 7. Scientific Workflow Tools Tools for scientific analysis (e.g. Kepler, Taverna) Self-documenting analysis Analysis conditions recorded at time of execution Steps in workflow executed by components with multiple input and output ports Components linked by joining input and output ports Supports modular analysis development, associated with easier collaboration and higher quality products Represented as flow diagram, stored and shared as single XML file
  • 8. Why Workflows Instead of Scripts? Differences and advantages over scripting languages Wider accessibility Programming skills helpful but not necessary to start Compatibility Easier integration of heterogeneous components Mash up and reuse existing scripts Standardized metadata Out-of-the-box operation Requires workflow software to execute, but little other configuration is required Example: BioPerl evolution from Perl scripts to libraries to workflow suites
  • 9. Taverna Workbench Open source stand-alone desktop application in Java Can also be run via “headless” server application Two main interface modes (v 1.71) Design mode: workflow definition, automatically rendered diagram, available component types Execution mode: process monitor, intermediate values, results, XML reports of status and all values Local and remote components are connected through input/output ports in MIME typing Can also be grouped into subworkflows
  • 12. Workflow Component Types Abstract & notification processors String constants: file locations, threshold values, etc. Beanshell: simplified Java Rshell: R statistical program running through Rserve Java widgets/shims for common operations i/o, lists, text manipulation, JDBC, etc. Command-line Web services WSDL with SOAP Can be “scavenged” from URLs or other workflows
  • 13. Example Workflow Teal: inputs Light blue: outputs Other light blue: string constant Green: web services Purple: Java shims Yellow: RShell Calculates weighted network centralization in dynamic networks, generates both numeric & graphical output
  • 14. Example Output Inputs Sliding window size Project venue Date range Outputs Time series graph of centralizations with summary stats CSV of dates and centralization values for additional analysis
  • 15. Benefits of Using Workflows Modular design yields benefits in flexibility, transparency, and ease of reuse Easier to co-develop and debug components, and to integrate independent efforts Can quickly change strategies with minimal adjustment to existing workflow structure Can reuse existing scripts and workflow components Can also conduct exhaustive sensitivity testing Multiple ways to achieve analysis tasks
  • 16. Conclusion Combination of growing large-scale data sets and workflow tools present great opportunity for eResearch on FLOSS Work needed for eResearch infrastructure: Access to data Ontologies for naming data and defining relationships Incorporating metadata & social science data, e.g. content analysis schemes Trade-offs involved in standardizing on tools to benefit collaboration, but much potential gain
  • 17. More Taverna demo screencast Long version (24 minutes): floss.syr.edu/Presentations/tavernaDemoScreencast.mov Short version (14 minutes): floss.syr.edu/Presentations/TavernaDemoRedux.m4v MyExperiment FLOSS group www.myexperiment.org/groups/64 16:30 - 17:30 presentation today: Social Dynamics of FLOSS Team Communication across Channels Tomorrow: Workshop on Public Data about Software Development (WoPDaSD 2008)