Dagster is a data orchestrator for machine learning, analytics, and ETL
Implement components in any tool, such as Pandas, Spark, SQL, or DBT.
Define your pipelines in terms of the data flow between reusable, logical components.
Test locally and run anywhere with a unified view of data pipelines and assets.
Develop and test on your laptop, deploy anywhere
With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.
Model and type the data produced and consumed by each step
Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.Learn More >
Link data to computations
Track what’s produced by your pipelines with Dagster's Asset Manager, so you can understand how your data was generated and trace issues when it doesn’t look how you expect.Learn More >
Build a self-service data platform
Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.
Avoid dependency nightmares
Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.Learn More >
Debug pipelines from a rich UI
Dagit, Dagster’s web interface, includes wide facilities for understanding the pipelines it orchestrates.
When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, and re-execute subsets of steps.
Dagster’s UI runs locally on your machine and can also be deployed to your production infrastructure for operational monitoring.
You’re in good company
Dagster is used to orchestrate data pipelines at some of our favorite companies. Here are a few:
Recent blog posts
Dagster 0.10.0: The Edge of Glory
In 0.10.0, we introduce unique event-based scheduling capabilities, hardened deployments on Kubernetes, and new primitives for persistence.
Good Data at Good Eggs: Using Dagster to manage the data platform
Running pipelines is only part of the operational burden of running a data platform. We also need to manage the platform itself and control associated technical debt. We found that Dagster was a very natural place to do that work, with the advantage that our entire operational view of the platform is consolidated in a single tool.
Broad support for existing pipelines and deployments
Incrementally adopt Dagster by wrapping existing code into Dagster solids.