SlideShare a Scribd company logo
Perceval, Graal and Arthur
The Quest for Software Project Data
Jesus M. Gonzalez-Barahona, Santiago Dueñas, Valerio Cosentino
@jgbarah, @sduenasd, @_valcos_
[jgb, sduenas, valcos] at bitergia.com
https://speakerdeck.com/bitergia
Open Source Summit North America,
Vancouver,
August 29th-31st, 2018
/chaoss
/grimoirelab
Software development
analytics with
free, open source software
(a CHAOSS project)
chaoss.github.io/grimoirelab
chaoss.github.io/grimoirelab-tutorial
/bitergia
Software Development Analytics
for your peace of mind
Outline
Context
Perceval
Graal
Arthur
Explotation
/context
/context
Are there new contributors?
/context
How many bugs were fixed past month?
/context
Has our gender diversity increased lately?
/context
Perceval gathers data for you
chaoss/grimoirelab-perceval
/perceval
/perceval
Backend gathering process for a specific data
source, incremental and archiving mechanisms.
Client complexities to query the data source
(e.g., pagination, OAuth access) handles
connection problems.
CommandLine set up parameters to execute a
backend. Optional arguments such as help and
debug.
/perceval As a program
$ pip3 install perceval
$ perceval github chaoss grimoirelab-perceval
--from-date=2017-03-01 --api-token=5…
[2018-08-01 12:31:10] - Sir Perceval is on his quest.
[2018-08-01 12:31:11] - Getting info for
https://api.github.com/users/jgbarah
[2018-08-01 12:31:12] - Getting info for
https://api.github.com/users/sduenas
… producing JSON documents …
[2018-08-01 12:34:22] - Sir Perceval completed his quest
/perceval
As a Python3 library
from perceval.backends.core.github import GitHub
from_date = datetime.datetime(2017, 3, 1)
github = GitHub("chaoss", "grimoirelab-perceval",
api_token="5e3d7...")
for issue in github.fetch(from_date=from_date):
print(issue['data'])
/perceval
{ "backend_name": "GitHub",
"backend_version": "0.17.2",
"category": "issue",
"data": {
"comments": 5,
"comments_data": {...}
"created_at": "2017-02-28T05:33:10Z",
....,
"id": 210691361,
"state": "closed",
"title": "..",
"updated_at": "2017-03-02T09:51:49Z"
},
"origin": "https://github.com/chaoss/grimoirelab-perceval",
"perceval_version": "0.11.7",
"tag": "https://github.com/chaoss/grimoirelab-perceval",
"timestamp": 1517479766.878609,
"updated_on": 1488448309.0,
"uuid": "77a42463d5d0a34b1d58006263b85909a9788b52" }
Data source
data
}
]Perceval
data
/graal
No code-based analysis with Perceval:
- Code complexity
- License evolution
- ...
/graal
Graal does it!
It leverages on the incremental functionalities
of Perceval and enhances the logic to handle
Git repositories to process source code
valeriocos/graal
/graal
/graal
Filter of commits based on Git JSON
documents. For selected commits,
checkouts it on the working tree.
Analysis executes analysis tools in the
working tree . Results of the analysis are
automatically embedded in the JSON
document.
Post-process alters the attributes of
inflated JSON document, thus granting the
user complete control over the output.
/graal
Code Complexity Dependencies
Quality checks Security
Backends
/graal As a program
$ pip install graal
$ sudo apt-install cloc
$ graal cocom
https://github.com/chaoss/grimoirelab-perceval
--git-path /tmp/graal-cocom
[2018-05-30 18:22:35,643] - Starting the quest for the
Graal.
[2018-05-30 18:22:39,958] - Git worktree /tmp/...
created!
[2018-05-30 18:22:39,959] - Fetching commits
… producing JSON documents …
[2018-05-31 04:51:56,112] - Quest completed.
/graal
As a Python3 library
from graal.backends.core.cocom import CoCom
repo_uri =
"https://github.com/chaoss/grimoirelab-perceval"
repo_dir = "/tmp/graal-cocom"
cc = CoCom(uri=repo_uri, gitpath=repo_dir)
commits = [commit for commit in cc.fetch()]
/graal
{ "backend_name": "CoCom",
"backend_version": "0.2.1",
"category": “code_complexity",
"data": {
"AuthorDate": “Mon May 28 ...”,
"CommitDate": “Mon May 28 ...”,
"commit": "dc78c25….",
"message": "Increase coverage ...",
"analysis": [{
"ccn": 80, “num_funs”: 33, …,
"comments": "153", “loc”: 341,
"file_path": "perceval/backend.py",
}], ...
},
"origin": "https://github.com/chaoss/grimoirelab-perceval",
"graal_version": "0.1.0",
"tag": "https://github.com/chaoss/grimoirelab-perceval",
"timestamp": 1534782753.878609,
"updated_on": 1392000436.0,
"uuid": "77a42463d5d0a34b1d58006263b85909a9788b52" }
Git metadata
}
]Graal
data
} Code data
/arthur
Arthur allows to schedule and run Perceval (and Graal)
executions at scale through distributed Redis queues.
chaoss/grimoirelab-kingarthur
/explotation
/explotation
/explotation
/resources
chaoss/grimoirelab-perceval
chaoss/grimoirelab-kingarthur
valeriocos/graal
grimoirelab.github.io
Perceval
Graal
@grimoirelab
@bitergia
@CHAOSSproj

More Related Content

PDF
Graal The Quest for Source Code Knowledge
PDF
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
PDF
Real-Time In-Flight Drone Route Optimization with Apache Spark with Gary Nak...
PDF
Spatial Statistics on the Geospatial Web
PDF
Building Reproducible Network Data Analysis / Visualization Workflows
PDF
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
PDF
Publishing metadata provenance
PDF
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto
Graal The Quest for Source Code Knowledge
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
Real-Time In-Flight Drone Route Optimization with Apache Spark with Gary Nak...
Spatial Statistics on the Geospatial Web
Building Reproducible Network Data Analysis / Visualization Workflows
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Publishing metadata provenance
Collecting Endpoint Security Logs Through Big Data Technology - Dedi Dwianto

What's hot (20)

PDF
Reactive Databases for Big Data applications
PDF
Graph Computing with Apache TinkerPop
PPTX
Dataset Descriptions in Open PHACTS and HCLS
PDF
JanusGraph: Looking Backward, Reaching Forward
PDF
Linking Syriac Geographical Data
PPTX
Elastic Stack Introduction
PDF
Start Flying with Python & Apache TinkerPop
PDF
Gonzalez barahona community_board_metrics_0415162
PDF
This week in Neo4j - 14th October 2017
ODP
Data Visualisations In IavaScript
PDF
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
PDF
BETTER Session, Excercise 1 @ EO Joint Big Data Hackathon
PDF
Exploring Graph Use Cases with JanusGraph
PDF
All About GRAND Stack: GraphQL, React, Apollo, and Neo4j (Mark Needham) - Gre...
PDF
An efficient data mining solution by integrating Spark and Cassandra
PDF
Self-Service IoT Data Analytics with StreamPipes
PDF
Enabling Cloud Bursting for Life Sciences within Galaxy
PDF
Airline Reservations and Routing: A Graph Use Case
PPTX
Real-time Analytics with Presto and Apache Pinot
PDF
Community-Driven Graphs with JanusGraph
Reactive Databases for Big Data applications
Graph Computing with Apache TinkerPop
Dataset Descriptions in Open PHACTS and HCLS
JanusGraph: Looking Backward, Reaching Forward
Linking Syriac Geographical Data
Elastic Stack Introduction
Start Flying with Python & Apache TinkerPop
Gonzalez barahona community_board_metrics_0415162
This week in Neo4j - 14th October 2017
Data Visualisations In IavaScript
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
BETTER Session, Excercise 1 @ EO Joint Big Data Hackathon
Exploring Graph Use Cases with JanusGraph
All About GRAND Stack: GraphQL, React, Apollo, and Neo4j (Mark Needham) - Gre...
An efficient data mining solution by integrating Spark and Cassandra
Self-Service IoT Data Analytics with StreamPipes
Enabling Cloud Bursting for Life Sciences within Galaxy
Airline Reservations and Routing: A Graph Use Case
Real-time Analytics with Presto and Apache Pinot
Community-Driven Graphs with JanusGraph
Ad

Similar to Perceval, Graal and Arthur: The Quest for Software Project Data (20)

PDF
Measuring Software development with GrimoireLab
PDF
GrimoireLab: Measuring the health of your software project with Python
PDF
Measuring Software development with GrimoireLab
PDF
Perceval: Software Project Data at Your Will
PDF
Soft Dive Into GrimoireLab. Twitter OSS workshop
PDF
OSMC 2015: Monitoring Linux and Windows Logs with the Graylog Collector byBer...
PDF
OSMC 2015 | Monitoring Linux and Windows Logs with the Graylog Collector by B...
PDF
FOSDEM '18 - Tools for large scale collection and analysis of source code re...
PDF
Code as Data workshop: Using source{d} Engine to extract insights from git re...
PPTX
Impact of Installation Counts on Perceived Quality: A Case Study on Debian
PDF
How we scaled git lab for a 30k employee company
PDF
Using GIT
PDF
Ten years analysing large code bases: a perspective
PPTX
PURL and vers: The Mostly Universal Package URL and Version Ranges Identifier...
PDF
Going to Mars with Groovy Domain-Specific Languages
PDF
Paver For PyWorks 2008
PPTX
PDF
Presentation distro recipes-2013
PDF
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
PDF
Regex Considered Harmful: Use Rosie Pattern Language Instead
Measuring Software development with GrimoireLab
GrimoireLab: Measuring the health of your software project with Python
Measuring Software development with GrimoireLab
Perceval: Software Project Data at Your Will
Soft Dive Into GrimoireLab. Twitter OSS workshop
OSMC 2015: Monitoring Linux and Windows Logs with the Graylog Collector byBer...
OSMC 2015 | Monitoring Linux and Windows Logs with the Graylog Collector by B...
FOSDEM '18 - Tools for large scale collection and analysis of source code re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Impact of Installation Counts on Perceived Quality: A Case Study on Debian
How we scaled git lab for a 30k employee company
Using GIT
Ten years analysing large code bases: a perspective
PURL and vers: The Mostly Universal Package URL and Version Ranges Identifier...
Going to Mars with Groovy Domain-Specific Languages
Paver For PyWorks 2008
Presentation distro recipes-2013
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Regex Considered Harmful: Use Rosie Pattern Language Instead
Ad

More from Valerio Cosentino (14)

PDF
Tracking counterfeiting on the web with python and ml
PDF
Gamification oss
PDF
SortingHat: Wizardry on Software Project Members
PDF
Crossminer and GrimoireLab
PDF
Extending grimoirelab
PDF
PDF
Gamification pres-scme-2017
PPT
A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...
PPTX
Gitana: a SQL-based Git Repository Inspector
PPTX
Assessing the Bus Factor of Git Repositories
PPT
A Model-Driven Approach to Generate External DSLs from Object-Oriented APIs
PDF
Extracting Business Rules from COBOL: A Model-Based Framework
PDF
Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...
PDF
A Model Driven Reverse Engineering framework for extracting business rules ou...
Tracking counterfeiting on the web with python and ml
Gamification oss
SortingHat: Wizardry on Software Project Members
Crossminer and GrimoireLab
Extending grimoirelab
Gamification pres-scme-2017
A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...
Gitana: a SQL-based Git Repository Inspector
Assessing the Bus Factor of Git Repositories
A Model-Driven Approach to Generate External DSLs from Object-Oriented APIs
Extracting Business Rules from COBOL: A Model-Based Framework
Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...
A Model Driven Reverse Engineering framework for extracting business rules ou...

Recently uploaded (20)

PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
meets orient on the new industry intereacting skills .pptx
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PDF
6th International Conference on Artificial Intelligence and Machine Learning ...
PPTX
AgentX UiPath Community Webinar series - Delhi
PPTX
Security-Responsibilities-in-the-Cloud-Azure-Shared-Responsibility-Model.pptx
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
PDF
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
PDF
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Internship_Presentation_Final engineering.pptx
PDF
ETO & MEO Certificate of Competency Questions and Answers
PPTX
Chapter----five---Resource Recovery.pptx
PPT
Chapter 6 Design in software Engineeing.ppt
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPT
Ppt for engineering students application on field effect
PPTX
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
PPTX
TE-AI-Unit VI notes using planning model
PDF
Chad Ayach - A Versatile Aerospace Professional
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
meets orient on the new industry intereacting skills .pptx
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
6th International Conference on Artificial Intelligence and Machine Learning ...
AgentX UiPath Community Webinar series - Delhi
Security-Responsibilities-in-the-Cloud-Azure-Shared-Responsibility-Model.pptx
dse_final_merit_2025_26 gtgfffffcjjjuuyy
algorithms-16-00088-v2hghjjnjnhhhnnjhj.pdf
BRKDCN-2613.pdf Cisco AI DC NVIDIA presentation
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Internship_Presentation_Final engineering.pptx
ETO & MEO Certificate of Competency Questions and Answers
Chapter----five---Resource Recovery.pptx
Chapter 6 Design in software Engineeing.ppt
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
Ppt for engineering students application on field effect
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
TE-AI-Unit VI notes using planning model
Chad Ayach - A Versatile Aerospace Professional

Perceval, Graal and Arthur: The Quest for Software Project Data

  • 1. Perceval, Graal and Arthur The Quest for Software Project Data Jesus M. Gonzalez-Barahona, Santiago Dueñas, Valerio Cosentino @jgbarah, @sduenasd, @_valcos_ [jgb, sduenas, valcos] at bitergia.com https://speakerdeck.com/bitergia Open Source Summit North America, Vancouver, August 29th-31st, 2018
  • 3. /grimoirelab Software development analytics with free, open source software (a CHAOSS project) chaoss.github.io/grimoirelab chaoss.github.io/grimoirelab-tutorial
  • 7. /context Are there new contributors?
  • 8. /context How many bugs were fixed past month?
  • 9. /context Has our gender diversity increased lately?
  • 10. /context Perceval gathers data for you chaoss/grimoirelab-perceval
  • 12. /perceval Backend gathering process for a specific data source, incremental and archiving mechanisms. Client complexities to query the data source (e.g., pagination, OAuth access) handles connection problems. CommandLine set up parameters to execute a backend. Optional arguments such as help and debug.
  • 13. /perceval As a program $ pip3 install perceval $ perceval github chaoss grimoirelab-perceval --from-date=2017-03-01 --api-token=5… [2018-08-01 12:31:10] - Sir Perceval is on his quest. [2018-08-01 12:31:11] - Getting info for https://api.github.com/users/jgbarah [2018-08-01 12:31:12] - Getting info for https://api.github.com/users/sduenas … producing JSON documents … [2018-08-01 12:34:22] - Sir Perceval completed his quest
  • 14. /perceval As a Python3 library from perceval.backends.core.github import GitHub from_date = datetime.datetime(2017, 3, 1) github = GitHub("chaoss", "grimoirelab-perceval", api_token="5e3d7...") for issue in github.fetch(from_date=from_date): print(issue['data'])
  • 15. /perceval { "backend_name": "GitHub", "backend_version": "0.17.2", "category": "issue", "data": { "comments": 5, "comments_data": {...} "created_at": "2017-02-28T05:33:10Z", ...., "id": 210691361, "state": "closed", "title": "..", "updated_at": "2017-03-02T09:51:49Z" }, "origin": "https://github.com/chaoss/grimoirelab-perceval", "perceval_version": "0.11.7", "tag": "https://github.com/chaoss/grimoirelab-perceval", "timestamp": 1517479766.878609, "updated_on": 1488448309.0, "uuid": "77a42463d5d0a34b1d58006263b85909a9788b52" } Data source data } ]Perceval data
  • 16. /graal No code-based analysis with Perceval: - Code complexity - License evolution - ...
  • 17. /graal Graal does it! It leverages on the incremental functionalities of Perceval and enhances the logic to handle Git repositories to process source code valeriocos/graal
  • 19. /graal Filter of commits based on Git JSON documents. For selected commits, checkouts it on the working tree. Analysis executes analysis tools in the working tree . Results of the analysis are automatically embedded in the JSON document. Post-process alters the attributes of inflated JSON document, thus granting the user complete control over the output.
  • 21. /graal As a program $ pip install graal $ sudo apt-install cloc $ graal cocom https://github.com/chaoss/grimoirelab-perceval --git-path /tmp/graal-cocom [2018-05-30 18:22:35,643] - Starting the quest for the Graal. [2018-05-30 18:22:39,958] - Git worktree /tmp/... created! [2018-05-30 18:22:39,959] - Fetching commits … producing JSON documents … [2018-05-31 04:51:56,112] - Quest completed.
  • 22. /graal As a Python3 library from graal.backends.core.cocom import CoCom repo_uri = "https://github.com/chaoss/grimoirelab-perceval" repo_dir = "/tmp/graal-cocom" cc = CoCom(uri=repo_uri, gitpath=repo_dir) commits = [commit for commit in cc.fetch()]
  • 23. /graal { "backend_name": "CoCom", "backend_version": "0.2.1", "category": “code_complexity", "data": { "AuthorDate": “Mon May 28 ...”, "CommitDate": “Mon May 28 ...”, "commit": "dc78c25….", "message": "Increase coverage ...", "analysis": [{ "ccn": 80, “num_funs”: 33, …, "comments": "153", “loc”: 341, "file_path": "perceval/backend.py", }], ... }, "origin": "https://github.com/chaoss/grimoirelab-perceval", "graal_version": "0.1.0", "tag": "https://github.com/chaoss/grimoirelab-perceval", "timestamp": 1534782753.878609, "updated_on": 1392000436.0, "uuid": "77a42463d5d0a34b1d58006263b85909a9788b52" } Git metadata } ]Graal data } Code data
  • 24. /arthur Arthur allows to schedule and run Perceval (and Graal) executions at scale through distributed Redis queues. chaoss/grimoirelab-kingarthur