The dagster-airflow
package provides interoperability between Dagster and Airflow. The main scenarios for using the Dagster Airflow integration are:
This integration is designed to help support users who have existing Airflow usage and are looking to explore using Dagster.
While Airflow and Dagster have some significant differences, there are many concepts that overlap. To ease the transition, we recommend using this cheatsheet to understand how Airflow concepts map to Dagster.
Airflow concept | Dagster concept | Notes |
---|---|---|
DAG | Job | |
Task | Op | |
Operator | None | Dagster uses normal Python functions instead of framework-specific operator classes. For off-the-shelf functionality with third-party tools, Dagster provides integration libraries. |
Scheduler | Scheduler | |
Executor | Executor | |
DagBag | Code Locations | Multiple isolated code locations with different system and Python dependencies can exist within the same Dagster instance. |
Instance | Instance | |
SubDAGs / TaskGroups | Graphs, Tags and AssetGroups | Dagster provides rich, searchable metadata and tagging support well beyond what’s offered by Airflow. |
Hooks | Resources | Dagster resources contain a superset of the functionality of hooks and have much stronger composition guarantees. |
Pools | Run Coordinator | |
XComs | IO Manager | I/O managers are more powerful than XComs and allow the passing large datasets between jobs. |
Trigger | Launchpad | Triggering and configuring ad-hoc runs is easier in Dagster which allows them to be initiated through Dagit, the GraphQL API, or the CLI. |
Sensor | Sensor | |
DAG run | Job run | |
Plugins/Providers | Integrations | |
Datasets | Software-defined assets (SDAs) | SDAs are more powerful and mature than datasets and include support for things like partitioning. |
Connections/Variables | Run config, Configured API and environment variables (Dagster Cloud only) |