SlideShare a Scribd company logo
Contributing to Apache Airflow
Airflow Summit
8 July 2021
Kaxil Naik
Airflow Committer and PMC Member
OSS Airflow Team @ Astronomer
Who am I?
● Airflow Committer & PMC Member
● Manager of Airflow Engineering team @ Astronomer
○ Work full-time on Airflow
● Previously worked at DataReply
● Masters in Data Science & Analytics from Royal
Holloway, University of London
● Twitter: https://twitter.com/kaxil
● Github: https://github.com/kaxil/
● LinkedIn: https://www.linkedin.com/in/kaxil/
Agenda
● My Journey
● How to start contributing
● Communication channels
● Guidelines to become a committer
http://gph.is/1VBGIPv
My Journey
Motivation to contribute !
https://stackoverflow.com/q/47452879/5691525
Motivation to contribute !
https://stackoverflow.com/a/47452939/5691525
But it didn’t work …
Fixed it - My First PR
My First PR - Fixes Typo
My First PR - I didn’t follow Guidelines !!
https://media.giphy.com/media/KSKvdT1YGCpUIonvSq/giphy.gif
My First “Merged” PR/commit
http://gph.is/15RTH5O
Slowly & Steadily started adding more contributions
Became Airflow Committer & (P)PMC Member
https://twitter.com/ApacheAirflow/status/993950478785490945
Steered Release for Airflow 1.10.2
Became Leading Airflow Committer in Feb 2021
What did I learn by working on Airflow?
● Writing unit-tests
● Improved Coding skills
● Got to know many companies & devs across the globe
● Improved communication skills
○ Commit messages & PR descriptions
○ Email threads on dev list
○ Presentations (Public Speaking was one of my fears !!)
You are next !!
How to start contributing?
How to start contributing?
● Contributing Guidelines: CONTRIBUTING.rst
● Contributing Quick Start Guide: CONTRIBUTORS_QUICK_START.rst
● Good First Issues: https://github.com/apache/airflow/contribute
https://gph.is/g/ZWdK71X
Contribution Workflow
1. Find the issue you want to work on
2. Setup a local dev environment
3. Understand the codebase
4. Write Code & add tests
5. Run tests locally
6. Create PR and wait for reviews
7. Address any suggestions by reviewers
8. Nudge politely if your PR is pending reviews for a while
Finding issues to work on
Finding issues to work on
● Start small: the aim should be to understand the process
● Bugs / features impacting you or your work
● Documentation Issues (including Contribution Guides)
○ Missing or outdated info, typos, formatting issues, broken links etc
● Good First Issues: https://github.com/apache/airflow/contribute
● Other open GitHub Issues: https://github.com/apache/airflow/issues
Finding issues to work on - Open Unassigned Issues
If the issue is open and un-assigned,
comment you want to work on it.
A committer will assign that issue to
you. Then it is all yours.
Finding issues to work on - Improving Documentation
● If you faced an issue with docs, fix it for future readers
● Documentation PRs are the great first contributions
● Missing or outdated info, typos, formatting issues, broken links etc
● No need of writing unit tests
● Examples:
○ https://github.com/apache/airflow/pull/16275
○ https://github.com/apache/airflow/pull/13462
○ https://github.com/apache/airflow/pull/15265
Setup a local dev environment
SetUp Local Development Environment
● Fork Apache Airflow repo & clone it locally
● Install pre-commit hooks (link) to detect minor issues before creating a PR
○ Some of them even automatically fix issues e.g ‘black’ formats python code
○ Install pre-commit framework: pip install pre-commit
○ Install pre-commit hooks: pre-commit install
● Use breeze - a wrapper around docker-compose for Airflow development.
○ Mac Users: Increase resources available to Docker for Mac
○ Check Prerequisites: https://github.com/apache/airflow/blob/main/BREEZE.rst#prerequisites
○ Setup autocomplete: ./breeze setup-autocomplete
SetUp Local Development Environment - Breeze
● Airflow CI uses breeze too so it allows reproduction locally
● Allows running Airflow with different environments (different Python versions,
different Metadata db, etc):
○ ./breeze --python 3.6 --backend postgres --postgres-version 12
● You can also run a local instance of Airflow using:
○ ./breeze start-airflow --python 3.6 --backend postgres
● You can then access the Webserver on http://localhost:28080
SetUp Local Development Environment - Breeze
Understand the Codebase
Understand the Codebase
● apache/airflow is mono-repo containing code for:
○ Apache Airflow Python package
○ More than 60 Providers (Google, Amazon, Postgres, etc)
○ Container image
○ Helm Chart
● Each of these items are released and versioned separately
● Contribution process for the entire repo is same
Understand the Codebase
● Do not try to understand the entire codebase at once
● Get familiar with the directory structure first
● Dive into the source code related to your issue
● Similar to: If you are moving to a new house, you would try to first get
familiar with your immediate neighbours and then others. (unless you have
memory like Sheldon Cooper !!!)
http://gph.is/2F2nUVb
Understand the Codebase - Directory Structure
Area Paths (relative to the repository root)
Core Airflow Docs docs/apache-airflow
Stable API airflow/api_connexion
CLI airflow/cli
Webserver / UI airflow/www
Scheduler airflow/jobs/scheduler_job.py
Dag Parsing airflow/dag_processing
Executors airflow/executors
DAG Serialization airflow/serialization
Helm Chart (& it’s tests) chart
Container Image Dockerfile
Tests tests
Understand the Codebase - Directory Structure
Area Paths (relative to the repository root)
Providers airflow/providers
Core Operators airflow/operators
Core Hooks airflow/hooks
Core Sensors airflow/sensors
DB Migrations airflow/migrations
ORM Models
(Python Class -> DB Tables)
airflow/models
Secrets Backend airflow/secrets
Configuration airflow/configuration.py
Permission Model airflow/www/security.py
All Docs (incl. docs for Chart & Container image) docs
Understand the Codebase - Areas
● Get expertise in a certain area before diving into a different one.
Easy Medium Complex (core)
Docs Webserver Scheduler
CLI Helm Chart Executors
Operators / Hooks /
Sensors (Providers)
Dockerfile Configuration
Stable API Secrets Backend Permission Model
DB Migrations Dag Parsing
Write Code, add docs & tests
Write code
● Take inspiration from existing code
● E.g. when writing a hook, look at:
○ Code for other similar hooks
○ PRs that added other hooks to see everything that changed including docs & tests
● Check out Coding style and best practices in CONTRIBUTING.rst
Add tests and docs
● The tests directory has same structure as airflow.
● E.g If code file is airflow/providers/google/cloud/operators/bigquery.py
; tests for it should
be added at tests/providers/google/cloud/operators/test_bigquery.py
● Docs for it would be at
docs/apache-airflow-providers-google/operators/cloud/bigquery.rst
Run tests locally
Run tests locally - Single Test
● Start breeze: ./breeze --backend postgres --python 3.7
● Run a single test from a file:
pytest tests/secrets/test_secrets.py -k test_backends_kwargs
Run tests locally - Multiple Tests
● Start breeze: ./breeze --backend postgres --python 3.7
● Run all test in a file:
pytest tests/secrets/test_secrets.py
Run tests locally
● Similarly, you can run various different tests locally:
○ Integration Tests (with Celery, Redis, etc)
○ Kubernetes Tests with the Helm Chart
○ System Tests (useful for testing providers)
● Check TESTING.rst for more details on how you can run them
Build docs locally
● If you have updated docs including docstrings, build docs locally
● Two types of tests for docs:
1. Docs are built successfully with Sphinx
2. Spelling Checks
Build docs locally
Example: If you updated Helm Chart docs (docs/helm-chart), build docs using
./breeze build-docs -- --package-filter helm-chart
Ready to commit - Static Code Checks
● Once you are happy with your code, commit it
● Pre-commit hooks will run as you as you run git commit
● ~90 pre-commit hooks (flake8, black, mypy, trim trailing whitespaces etc)
● All these hooks are documented in STATIC_CODE_CHECKS.rst
● Fix any failing hooks and run git add . && git commitagain until all pass
● These checks will be run on CI too when you create PR
Ready to commit - Static Code Checks
Write a good git commit message (Very Important)
1. Separate subject from body with a blank line
2. Limit the subject line to 50 characters
3. Capitalize the subject line
4. Do not end the subject line with a period
5. Use the imperative mood in the subject line
6. Wrap the body at 72 characters
7. Use the body to explain what and why vs. how
Source: https://chris.beams.io/posts/git-commit/
Example: https://github.com/apache/airflow/commit/73b9163a8f55ce3d5bf6aec0a558952c27dd1b55
Create PR and wait for reviews
Create PR
● Finally create a PR from your fork to apache/airflow repo
● Make sure to add PR description and title appropriately (similar to commit messages)
● You can add commits to your branch after creating the PR too
● Wait for one of the Committers to review the PR
● Reviewers of the PR might leave suggestions or ask clarifications
● Ask for help on the PR itself if you have any questions by tagging Committers
Wait for Reviews
● Be Patient, sometimes it may take multiple days or weeks before you get a review
● If you don’t get any reviews after a couple of weeks, you can ping on #development
channel in Airflow Slack Workspace.
Tests on CI
● Tests will run via GitHub Actions as soon as you create PR
● Fix any failing tests
Tests on CI
● Sometimes you might see CI failures unrelated to your PRs
● It can be due to one of the following reasons:
○ Flaky tests
○ Tests/Code on “main” branch might be broken
○ GitHub Runner failures -- these are transient errors
○ Timeouts due to no available slot to run on Workers
● Failure of “Quarantined Tests” can be ignored -- those are expected to fail randomly
When and who will merge the PR?
● One approved vote from a committer is needed before a PR can be merged
● One of the committers will merge the PR once the tests are completed
● Mention the committer who reviewed if your PR is approved but not merged for a while
Communication Channels
Communication channels
● Mailing Lists
○ Dev List - dev@airflow.apache.org (Public Archive Link)
■ official source for any decisions, discussions & announcements
■ "If it didn't happen on the dev list, it didn't happen"
■ Subscribe by sending email to dev-subscribe@airflow.apache.org
○ User List - users@airflow.apache.org (Public Archive Link)
● Airflow Slack Workspace: https://s.apache.org/airflow-slack (Public Archive Link)
● GitHub Discussions: https://github.com/apache/airflow/discussions
Guidelines to become a committer
Roles
● Contributors: Anyone who contributes code, documentation etc by creating PRs
● Committers: Community members that have ‘write access’ to the project’s repositories
● PMC Members: Members who are responsible for governance of the project
○ Binding votes on releases
○ Responsible for voting in new committers and PMC members to the project
○ Making sure code licenses and all ASF’s legal policies & brand are complied with
○ Dealing with vulnerability reports
How to become a Committer - Prerequisites
● Guidelines are documented at https://github.com/apache/airflow/blob/main/COMMITTERS.rst
● You can become committer either by (1) Code Contributions or (2) Community Contributions
● Prerequisites
○ Consistent contribution over last few months
○ Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions
○ Contributions to community health and project's sustainability for the long-term
○ Understands contributor/committer guidelines: Contributors' Guide
How to become a Committer - Code Contributions
1. High-quality commits (especially commit messages), including upgrade paths or deprecation policies
2. Testing Release Candidates
3. Proposed and led to completion Airflow Improvement Proposal(s) - AIPs
4. Champions one of the areas in the codebase like Airflow Core, API, Docker Image, Helm Chart, etc
5. Made a significant improvement or added an integration that is important to the Airflow Ecosystem
How to become a Committer - Community contributions
1. Instrumental in triaging issues
2. Improved documentation of Airflow in a significant way
3. Lead change and improvements in the “community” processes and tools
4. Actively spreads the word about Airflow, for example organising Airflow summit, workshops for
community members, giving and recording talks in Meetups & conference, writing blogs
5. Reporting bugs with detailed reproduction steps
Airflow Improvement Proposal (AIP)
● The purpose of an AIP is to introduce any major change to Apache Airflow, mostly the ones that
require architectural changes after planning and discussing with the community
● Details on https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals
● Proposal lifecycle:
○ Discuss - discussions on the dev mailing list
○ Draft - create a proposal on the WIKI
○ Vote - vote on dev mailing list (only Committers & PMC Members have a binding vote)
○ Accepted - work is started if vote passes
○ Completed - once all PRs related to the AIPs are merged
Links / References
Links
● Airflow
○ Repo: https://github.com/apache/airflow
○ Website: https://airflow.apache.org/
○ Blog: https://airflow.apache.org/blog/
○ Documentation: https://airflow.apache.org/docs/
○ Slack: https://s.apache.org/airflow-slack
○ Twitter: https://twitter.com/apacheairflow
● Contact Me:
○ Twitter: https://twitter.com/kaxil
○ Github: https://github.com/kaxil/
○ LinkedIn: https://www.linkedin.com/in/kaxil/
Thank You!
Ad

More Related Content

What's hot (20)

Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
Varya Karpenko
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
SaarBergerbest
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Purna Chander
 
Apache Airflow at Dailymotion
Apache Airflow at DailymotionApache Airflow at Dailymotion
Apache Airflow at Dailymotion
Germain Tanguy
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
Bruno Faria
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
Digital Vidya
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
Apache airflow
Apache airflowApache airflow
Apache airflow
Pavel Alexeev
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
pko89403
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Knoldus Inc.
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Apache Airflow at Dailymotion
Apache Airflow at DailymotionApache Airflow at Dailymotion
Apache Airflow at Dailymotion
Germain Tanguy
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
Sid Anand
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
Bruno Faria
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
pko89403
 

Similar to Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor (20)

Bgoug 2019.11 building free, open-source, plsql products in cloud
Bgoug 2019.11   building free, open-source, plsql products in cloudBgoug 2019.11   building free, open-source, plsql products in cloud
Bgoug 2019.11 building free, open-source, plsql products in cloud
Jacek Gebal
 
Contributing to github is for everyone
Contributing to github is for everyoneContributing to github is for everyone
Contributing to github is for everyone
Matt Heusser
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Pallavi Sharma
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
Weaveworks
 
apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
vijayapraba1
 
O'Leary - Using GitHub for Enterprise and Open Source Documentation
O'Leary - Using GitHub for Enterprise and Open Source DocumentationO'Leary - Using GitHub for Enterprise and Open Source Documentation
O'Leary - Using GitHub for Enterprise and Open Source Documentation
LavaCon
 
Getting started contributing to Apache Spark
Getting started contributing to Apache SparkGetting started contributing to Apache Spark
Getting started contributing to Apache Spark
Holden Karau
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
Holden Karau
 
Untangling4
Untangling4Untangling4
Untangling4
Derek Jacoby
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
Intro. to Git and Github
Intro. to Git and GithubIntro. to Git and Github
Intro. to Git and Github
Olmo F. Maldonado
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears Training
Alessandro Molina
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Databricks
 
Autolab Workshop
Autolab WorkshopAutolab Workshop
Autolab Workshop
Mihir Pandya
 
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
mfrancis
 
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Develcz
 
Django
DjangoDjango
Django
Ksd Che
 
Working process and git branch strategy
Working process and git branch strategyWorking process and git branch strategy
Working process and git branch strategy
Kan-Han (John) Lu
 
Bgoug 2019.11 building free, open-source, plsql products in cloud
Bgoug 2019.11   building free, open-source, plsql products in cloudBgoug 2019.11   building free, open-source, plsql products in cloud
Bgoug 2019.11 building free, open-source, plsql products in cloud
Jacek Gebal
 
Contributing to github is for everyone
Contributing to github is for everyoneContributing to github is for everyone
Contributing to github is for everyone
Matt Heusser
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Begin your journey to be a Selenium Committer - Valencia 2025 - Pallavi Sharm...
Pallavi Sharma
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
Weaveworks
 
apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
vijayapraba1
 
O'Leary - Using GitHub for Enterprise and Open Source Documentation
O'Leary - Using GitHub for Enterprise and Open Source DocumentationO'Leary - Using GitHub for Enterprise and Open Source Documentation
O'Leary - Using GitHub for Enterprise and Open Source Documentation
LavaCon
 
Getting started contributing to Apache Spark
Getting started contributing to Apache SparkGetting started contributing to Apache Spark
Getting started contributing to Apache Spark
Holden Karau
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
Holden Karau
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears Training
Alessandro Molina
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Databricks
 
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
Improved developer productivity thanks to Maven and OSGi - Lukasz Dywicki (Co...
mfrancis
 
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Daniel Steigerwald: EsteJS - javascriptové aplikace robusně, modulárně a komf...
Develcz
 
Working process and git branch strategy
Working process and git branch strategyWorking process and git branch strategy
Working process and git branch strategy
Kan-Han (John) Lu
 
Ad

More from Kaxil Naik (8)

Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Kaxil Naik
 
Airflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable OperatorsAirflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable Operators
Kaxil Naik
 
Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?
Kaxil Naik
 
What's new in Airflow 2.3?
What's new in Airflow 2.3?What's new in Airflow 2.3?
What's new in Airflow 2.3?
Kaxil Naik
 
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Kaxil Naik
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
Kaxil Naik
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Introducing airflowctl: A CLI to streamline getting started with Airflow - Ai...
Kaxil Naik
 
Airflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable OperatorsAirflow: Save Tons of Money by Using Deferrable Operators
Airflow: Save Tons of Money by Using Deferrable Operators
Kaxil Naik
 
Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?Why Airflow? & What's new in Airflow 2.3?
Why Airflow? & What's new in Airflow 2.3?
Kaxil Naik
 
What's new in Airflow 2.3?
What's new in Airflow 2.3?What's new in Airflow 2.3?
What's new in Airflow 2.3?
Kaxil Naik
 
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Kaxil Naik
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
Kaxil Naik
 
Ad

Recently uploaded (20)

How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 

Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor

  • 1. Contributing to Apache Airflow Airflow Summit 8 July 2021 Kaxil Naik Airflow Committer and PMC Member OSS Airflow Team @ Astronomer
  • 2. Who am I? ● Airflow Committer & PMC Member ● Manager of Airflow Engineering team @ Astronomer ○ Work full-time on Airflow ● Previously worked at DataReply ● Masters in Data Science & Analytics from Royal Holloway, University of London ● Twitter: https://twitter.com/kaxil ● Github: https://github.com/kaxil/ ● LinkedIn: https://www.linkedin.com/in/kaxil/
  • 3. Agenda ● My Journey ● How to start contributing ● Communication channels ● Guidelines to become a committer http://gph.is/1VBGIPv
  • 5. Motivation to contribute ! https://stackoverflow.com/q/47452879/5691525
  • 6. Motivation to contribute ! https://stackoverflow.com/a/47452939/5691525
  • 7. But it didn’t work …
  • 8. Fixed it - My First PR
  • 9. My First PR - Fixes Typo
  • 10. My First PR - I didn’t follow Guidelines !! https://media.giphy.com/media/KSKvdT1YGCpUIonvSq/giphy.gif
  • 11. My First “Merged” PR/commit http://gph.is/15RTH5O
  • 12. Slowly & Steadily started adding more contributions
  • 13. Became Airflow Committer & (P)PMC Member https://twitter.com/ApacheAirflow/status/993950478785490945
  • 14. Steered Release for Airflow 1.10.2
  • 15. Became Leading Airflow Committer in Feb 2021
  • 16. What did I learn by working on Airflow? ● Writing unit-tests ● Improved Coding skills ● Got to know many companies & devs across the globe ● Improved communication skills ○ Commit messages & PR descriptions ○ Email threads on dev list ○ Presentations (Public Speaking was one of my fears !!)
  • 18. How to start contributing?
  • 19. How to start contributing? ● Contributing Guidelines: CONTRIBUTING.rst ● Contributing Quick Start Guide: CONTRIBUTORS_QUICK_START.rst ● Good First Issues: https://github.com/apache/airflow/contribute https://gph.is/g/ZWdK71X
  • 20. Contribution Workflow 1. Find the issue you want to work on 2. Setup a local dev environment 3. Understand the codebase 4. Write Code & add tests 5. Run tests locally 6. Create PR and wait for reviews 7. Address any suggestions by reviewers 8. Nudge politely if your PR is pending reviews for a while
  • 21. Finding issues to work on
  • 22. Finding issues to work on ● Start small: the aim should be to understand the process ● Bugs / features impacting you or your work ● Documentation Issues (including Contribution Guides) ○ Missing or outdated info, typos, formatting issues, broken links etc ● Good First Issues: https://github.com/apache/airflow/contribute ● Other open GitHub Issues: https://github.com/apache/airflow/issues
  • 23. Finding issues to work on - Open Unassigned Issues If the issue is open and un-assigned, comment you want to work on it. A committer will assign that issue to you. Then it is all yours.
  • 24. Finding issues to work on - Improving Documentation ● If you faced an issue with docs, fix it for future readers ● Documentation PRs are the great first contributions ● Missing or outdated info, typos, formatting issues, broken links etc ● No need of writing unit tests ● Examples: ○ https://github.com/apache/airflow/pull/16275 ○ https://github.com/apache/airflow/pull/13462 ○ https://github.com/apache/airflow/pull/15265
  • 25. Setup a local dev environment
  • 26. SetUp Local Development Environment ● Fork Apache Airflow repo & clone it locally ● Install pre-commit hooks (link) to detect minor issues before creating a PR ○ Some of them even automatically fix issues e.g ‘black’ formats python code ○ Install pre-commit framework: pip install pre-commit ○ Install pre-commit hooks: pre-commit install ● Use breeze - a wrapper around docker-compose for Airflow development. ○ Mac Users: Increase resources available to Docker for Mac ○ Check Prerequisites: https://github.com/apache/airflow/blob/main/BREEZE.rst#prerequisites ○ Setup autocomplete: ./breeze setup-autocomplete
  • 27. SetUp Local Development Environment - Breeze ● Airflow CI uses breeze too so it allows reproduction locally ● Allows running Airflow with different environments (different Python versions, different Metadata db, etc): ○ ./breeze --python 3.6 --backend postgres --postgres-version 12 ● You can also run a local instance of Airflow using: ○ ./breeze start-airflow --python 3.6 --backend postgres ● You can then access the Webserver on http://localhost:28080
  • 28. SetUp Local Development Environment - Breeze
  • 30. Understand the Codebase ● apache/airflow is mono-repo containing code for: ○ Apache Airflow Python package ○ More than 60 Providers (Google, Amazon, Postgres, etc) ○ Container image ○ Helm Chart ● Each of these items are released and versioned separately ● Contribution process for the entire repo is same
  • 31. Understand the Codebase ● Do not try to understand the entire codebase at once ● Get familiar with the directory structure first ● Dive into the source code related to your issue ● Similar to: If you are moving to a new house, you would try to first get familiar with your immediate neighbours and then others. (unless you have memory like Sheldon Cooper !!!) http://gph.is/2F2nUVb
  • 32. Understand the Codebase - Directory Structure Area Paths (relative to the repository root) Core Airflow Docs docs/apache-airflow Stable API airflow/api_connexion CLI airflow/cli Webserver / UI airflow/www Scheduler airflow/jobs/scheduler_job.py Dag Parsing airflow/dag_processing Executors airflow/executors DAG Serialization airflow/serialization Helm Chart (& it’s tests) chart Container Image Dockerfile Tests tests
  • 33. Understand the Codebase - Directory Structure Area Paths (relative to the repository root) Providers airflow/providers Core Operators airflow/operators Core Hooks airflow/hooks Core Sensors airflow/sensors DB Migrations airflow/migrations ORM Models (Python Class -> DB Tables) airflow/models Secrets Backend airflow/secrets Configuration airflow/configuration.py Permission Model airflow/www/security.py All Docs (incl. docs for Chart & Container image) docs
  • 34. Understand the Codebase - Areas ● Get expertise in a certain area before diving into a different one. Easy Medium Complex (core) Docs Webserver Scheduler CLI Helm Chart Executors Operators / Hooks / Sensors (Providers) Dockerfile Configuration Stable API Secrets Backend Permission Model DB Migrations Dag Parsing
  • 35. Write Code, add docs & tests
  • 36. Write code ● Take inspiration from existing code ● E.g. when writing a hook, look at: ○ Code for other similar hooks ○ PRs that added other hooks to see everything that changed including docs & tests ● Check out Coding style and best practices in CONTRIBUTING.rst
  • 37. Add tests and docs ● The tests directory has same structure as airflow. ● E.g If code file is airflow/providers/google/cloud/operators/bigquery.py ; tests for it should be added at tests/providers/google/cloud/operators/test_bigquery.py ● Docs for it would be at docs/apache-airflow-providers-google/operators/cloud/bigquery.rst
  • 39. Run tests locally - Single Test ● Start breeze: ./breeze --backend postgres --python 3.7 ● Run a single test from a file: pytest tests/secrets/test_secrets.py -k test_backends_kwargs
  • 40. Run tests locally - Multiple Tests ● Start breeze: ./breeze --backend postgres --python 3.7 ● Run all test in a file: pytest tests/secrets/test_secrets.py
  • 41. Run tests locally ● Similarly, you can run various different tests locally: ○ Integration Tests (with Celery, Redis, etc) ○ Kubernetes Tests with the Helm Chart ○ System Tests (useful for testing providers) ● Check TESTING.rst for more details on how you can run them
  • 42. Build docs locally ● If you have updated docs including docstrings, build docs locally ● Two types of tests for docs: 1. Docs are built successfully with Sphinx 2. Spelling Checks
  • 43. Build docs locally Example: If you updated Helm Chart docs (docs/helm-chart), build docs using ./breeze build-docs -- --package-filter helm-chart
  • 44. Ready to commit - Static Code Checks ● Once you are happy with your code, commit it ● Pre-commit hooks will run as you as you run git commit ● ~90 pre-commit hooks (flake8, black, mypy, trim trailing whitespaces etc) ● All these hooks are documented in STATIC_CODE_CHECKS.rst ● Fix any failing hooks and run git add . && git commitagain until all pass ● These checks will be run on CI too when you create PR
  • 45. Ready to commit - Static Code Checks
  • 46. Write a good git commit message (Very Important) 1. Separate subject from body with a blank line 2. Limit the subject line to 50 characters 3. Capitalize the subject line 4. Do not end the subject line with a period 5. Use the imperative mood in the subject line 6. Wrap the body at 72 characters 7. Use the body to explain what and why vs. how Source: https://chris.beams.io/posts/git-commit/ Example: https://github.com/apache/airflow/commit/73b9163a8f55ce3d5bf6aec0a558952c27dd1b55
  • 47. Create PR and wait for reviews
  • 48. Create PR ● Finally create a PR from your fork to apache/airflow repo ● Make sure to add PR description and title appropriately (similar to commit messages) ● You can add commits to your branch after creating the PR too ● Wait for one of the Committers to review the PR ● Reviewers of the PR might leave suggestions or ask clarifications ● Ask for help on the PR itself if you have any questions by tagging Committers
  • 49. Wait for Reviews ● Be Patient, sometimes it may take multiple days or weeks before you get a review ● If you don’t get any reviews after a couple of weeks, you can ping on #development channel in Airflow Slack Workspace.
  • 50. Tests on CI ● Tests will run via GitHub Actions as soon as you create PR ● Fix any failing tests
  • 51. Tests on CI ● Sometimes you might see CI failures unrelated to your PRs ● It can be due to one of the following reasons: ○ Flaky tests ○ Tests/Code on “main” branch might be broken ○ GitHub Runner failures -- these are transient errors ○ Timeouts due to no available slot to run on Workers ● Failure of “Quarantined Tests” can be ignored -- those are expected to fail randomly
  • 52. When and who will merge the PR? ● One approved vote from a committer is needed before a PR can be merged ● One of the committers will merge the PR once the tests are completed ● Mention the committer who reviewed if your PR is approved but not merged for a while
  • 54. Communication channels ● Mailing Lists ○ Dev List - [email protected] (Public Archive Link) ■ official source for any decisions, discussions & announcements ■ "If it didn't happen on the dev list, it didn't happen" ■ Subscribe by sending email to [email protected] ○ User List - [email protected] (Public Archive Link) ● Airflow Slack Workspace: https://s.apache.org/airflow-slack (Public Archive Link) ● GitHub Discussions: https://github.com/apache/airflow/discussions
  • 55. Guidelines to become a committer
  • 56. Roles ● Contributors: Anyone who contributes code, documentation etc by creating PRs ● Committers: Community members that have ‘write access’ to the project’s repositories ● PMC Members: Members who are responsible for governance of the project ○ Binding votes on releases ○ Responsible for voting in new committers and PMC members to the project ○ Making sure code licenses and all ASF’s legal policies & brand are complied with ○ Dealing with vulnerability reports
  • 57. How to become a Committer - Prerequisites ● Guidelines are documented at https://github.com/apache/airflow/blob/main/COMMITTERS.rst ● You can become committer either by (1) Code Contributions or (2) Community Contributions ● Prerequisites ○ Consistent contribution over last few months ○ Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions ○ Contributions to community health and project's sustainability for the long-term ○ Understands contributor/committer guidelines: Contributors' Guide
  • 58. How to become a Committer - Code Contributions 1. High-quality commits (especially commit messages), including upgrade paths or deprecation policies 2. Testing Release Candidates 3. Proposed and led to completion Airflow Improvement Proposal(s) - AIPs 4. Champions one of the areas in the codebase like Airflow Core, API, Docker Image, Helm Chart, etc 5. Made a significant improvement or added an integration that is important to the Airflow Ecosystem
  • 59. How to become a Committer - Community contributions 1. Instrumental in triaging issues 2. Improved documentation of Airflow in a significant way 3. Lead change and improvements in the “community” processes and tools 4. Actively spreads the word about Airflow, for example organising Airflow summit, workshops for community members, giving and recording talks in Meetups & conference, writing blogs 5. Reporting bugs with detailed reproduction steps
  • 60. Airflow Improvement Proposal (AIP) ● The purpose of an AIP is to introduce any major change to Apache Airflow, mostly the ones that require architectural changes after planning and discussing with the community ● Details on https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals ● Proposal lifecycle: ○ Discuss - discussions on the dev mailing list ○ Draft - create a proposal on the WIKI ○ Vote - vote on dev mailing list (only Committers & PMC Members have a binding vote) ○ Accepted - work is started if vote passes ○ Completed - once all PRs related to the AIPs are merged
  • 62. Links ● Airflow ○ Repo: https://github.com/apache/airflow ○ Website: https://airflow.apache.org/ ○ Blog: https://airflow.apache.org/blog/ ○ Documentation: https://airflow.apache.org/docs/ ○ Slack: https://s.apache.org/airflow-slack ○ Twitter: https://twitter.com/apacheairflow ● Contact Me: ○ Twitter: https://twitter.com/kaxil ○ Github: https://github.com/kaxil/ ○ LinkedIn: https://www.linkedin.com/in/kaxil/