Airflow_Best_Practices
Airflow_Best_Practices
• What is Airflow?
• Architecture Overview
• Workflow Components
• Example Workflow
• Establishing Connections
• Scheduling and Execution
• Best Practices
Agenda Continued
Airflow uses similar concepts to CRON for scheduling, except a more complex set of tasks may be executed
within a workflow.
Workflow (DAG) Overview
Extract data from a database, write it to the data lake, clean and validate it, transform it, and finally load it
somewhere. Each node is a separate task and the arrows illustrate task dependency.
Workflow Components: Task
• Configured in the UI
• Have a unique ID used within hooks
• Abstract in the sense that a file path may be a hook used within a
FileSensor
• Encrypted information with Fernet keys
Variables
DAGs and tasks may be documented, and this Create markdown templates to follow and require
documentation is displayed within the web UI them to be used with every DAG and task.
Best Practice: Avoid
costly code execution
during load time of a
DAG
• Airflow loads the DAGs on a
regular basis (default 30 seconds)
reading the entire script
• Long and slow running code in the
global scope of the script makes
the load take extra time
Best Practice: Use the with statement
Difficult to
interpret quickly! Much better!
Best Practice: Use factories to generate
common patterns