Blog
Software-Defined Assets Explained

Software-Defined Assets Explained

August 7, 2025
Software-Defined Assets Explained
Software-Defined Assets Explained

Software-Defined Assets are a new abstraction that allows data teams to focus on the end products, not just the individual tasks, in their data pipeline.

Why data needs declarative design

Software engineering has grown increasingly declarative. Front-end development has shifted from imperative libraries like jQuery to frameworks like React, allowing developers to describe what UI they want rather than how to render it. Similarly, DevOps has moved from shell scripts to Infrastructure as Code tools like Terraform and Pulumi, enabling teams to declare infrastructure states without detailing each provisioning step.

Declarative approaches are attractive because they simplify building and maintaining systems. Rather than defining how work is done, you describe the desired end state, and the framework handles execution.

While declarative methods are common in software engineering, data workflows often remain stubbornly imperative. This is partly because many data stacks are cobbled together from disparate scripts and cron jobs. Figuring out which process updates a particular database table or ML model can feel like a heroic act. And even after you find the relevant code, changes feel risky with uncertain externalities lurking in every refactor.

As data platforms scale, the need for a declarative approach grows more urgent. You can’t afford to treat data work as isolated jobs anymore; you need a higher-level abstraction. In DevOps, we manage infrastructure via resource definitions. In data, we should do the same and manage the assets (database tables, ML models, dashboards) as persistent objects that capture and encode some slice of the world. Assets may serve multiple teams and use cases, but they remain the fundamental units of a data platform.

From pipelines to assets

Declarative data engineering begins with defining assets in code, describing not just what the asset is but also how it’s generated and where it fits within the platform.

Each asset definition in Dagster includes:

  • An asset key, serving as a global identifier
  • An op, the function that produces the asset
  • A set of upstream asset keys, defining dependencies

Combining these elements creates a powerful model for building data applications. By tracking dependencies alongside each node, Dagster can automatically infer the asset graph, eliminating the need to manually define explicit DAGs that often become outdated or misaligned with actual dependencies. This approach makes it much easier to scale a data platform, especially as even modest organizations can have hundreds or thousands of assets.

Below is a simple asset definition using Dagster’s Python API. It defines an asset called “logins”, which is derived from an upstream asset called “website_events”.

The asset key is `logins`, derived from the name of the decorated function. The function itself is the “op” and used to compute the asset. The upstream asset, `website_events`, is inferred from the function’s argument.

The `logins` asset is defined as a pure function. While Dagster supports defining assets in pure Python, assets can represent many other entities. For example, a dbt model can be represented as an asset, and Dagster can load entire dbt projects to generate individual assets for each model. The same applies to tables in a Postgres database managed by Fivetran or dashboards in Looker. Any object can be represented as an asset.

Beyond operational definitions, software-defined assets are also a natural hub for metadata. Below is an example of an asset definition with additional metadata.

@dg.asset(
    metadata={
        "owner": "sandy@elementl.com",
        "domain": "core-site",
        "priority": 3
    },
  partitions_def=DailyPartitionsDefinition(start_date="2020-10-05"),
    kinds={"pandas"},
)
def logins(website_events: DataFrame, users: DataFrame) -> LoginsSchema:
    """
    The logins table contains an entry for each login to the site.
    “””

Understanding the asset graph

A collection of software-defined assets forms an asset graph, which is essential for understanding and working with data. For developers, the asset graph provides a holistic view of the data platform and a clear way to navigate dependencies across processes and systems. For consumers, it reveals the status of the assets they rely on.

Taking this approach to the data platform means no single team is burdened with maintaining a monolithic DAG to capture all dependencies. Instead, each team owns a subset of the overall asset graph, managing only the assets relevant to their domain.

This approach allows teams or individual users to build on existing assets and take dependencies without duplicating work. It also eliminates the need to understand the internals of upstream assets, letting teams focus on the interfaces between them. By defining asset lineage declaratively, much of the chaos typically found in data platforms is naturally brought under control.

Orchestrating change in assets

Keeping assets in sync and up to date is typically the job of an orchestrator. Since orchestrators are the source of truth for scheduling, dependencies, and triggering computations, they hold the key to critical questions:

  • Is this asset up to date?
  • What do I need to run to refresh this asset?
  • When will this asset be updated again?
  • What code generated this asset?

These are the questions that matter when maintaining a data platform. Unfortunately, traditional orchestrators are poorly equipped to answer them because most operate at the task level, not the asset level. As a result, they provide insight into pipeline status but not the state of the underlying assets we actually care about.

An orchestrator that centers on assets reduces the friction of translating between pipelines and the data products that matter. This approach makes it much easier to identify where errors occur and understand what might be impacted.

Reconciling assets

Asset-based orchestration isn’t just about making it easier to translate between assets and computation, it also enables a more principled approach to managing change through reconciliation. Like other declarative systems, reconciliation focuses on the difference between how things are and how they should be. The orchestrator is responsible for identifying these discrepancies and materializing assets as needed.

Compared to simple schedule-based materialization, this approach is far more efficient. Instead of reprocessing data shared across multiple workflows, only the assets required to reach the desired state are recomputed.

The most basic form of reconciliation occurs when a software-defined asset has no existing materialization. For example, here’s an asset graph that includes an unmaterialized asset, `comment_stories`.

Materializing `comment_stories` resolves this discrepancy.

Another discrepancy is when the contents of an asset do not reflect the assets it depends on. By default, assets with upstream dependencies should be based on the most recent contents of those upstream assets. If the most recent materialization of the asset occurs earlier than the most recent materialization of its upstream assets, it is likely stale and will indicate which of its dependencies are unsynced.

Approaching orchestration in this declarative way prevents many of the pain points associated with traditional orchestrators. Instead of needing to specify schedules at specific intervals or coordinating competing triggers, multiple users and teams can take dependencies on the same asset with the awkwardness of waster computation and poor observability.

Software-defined assets and the Modern Data Stack

The Modern Data Stack refers to a collection of tools and practices that have significantly simplified common data workflows. For example:

  • Instead of writing custom tasks to chain SQL statements, dbt models directly express business logic within the database.
  • Instead of coding pipelines to replicate tables into your data warehouse, tools like Fivetran or dlt handle that out of the box.

These tools have been a massive quality-of-life improvement for data practitioners but notable gaps remain. Many of these tools operate in isolation, with no effective way to unify them. Without a shared orchestration layer, teams are forced into brittle, error-prone workflows like manually coordinating schedules across disparate systems.

Another gap is in supporting Python and non-SQL workflows, which are often unique to each organization. Integrating tools like Pandas or PySpark into these pipelines adds complexity that the Modern Data Stack doesn’t easily address.

An asset-based orchestrator fills these gaps by combining polyglot compute with a declarative, asset-centric approach. Dagster makes Python a first-class citizen in the Modern Data Stack and enables asset graphs that span multiple systems and frameworks.

To make this even easier, Dagster supports Components. Components act as an abstraction layer over software-defined assets, allowing you to scaffold common integrations.

type: dagster_dbt.DbtProjectComponent

attributes:
 project: '{{ project_root }}/dbt'

This uses the `DbtProjectComponent` to read the dbt project located in the root of our Dagster project and generate all of the necessary assets.

This gives us the Pythonic interface for our dbt models. We can now associate them with other tools in our data platform with other components, or add our own custom assets into the asset graph giving us a full lineage of our data platform.

Conclusion

Approaching data with software-defined assets creates a more powerful foundation for building platforms with built-in lineage and observability. This lets teams focus on what their data should produce and the value it delivers, while the orchestrator handles reconciliation and ensures assets are kept in their desired state.

The Modern Data Stack is already moving toward declarative, asset-based principles. By defining dependencies and scheduling policies at the asset level rather than the task level, an asset-based orchestrator becomes the natural fit for this paradigm. Dagster enables this shift by supporting heterogeneous compute and providing a unified control plane, without forcing practitioners back into task-centric, imperative programming.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 13, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Announcing the Dagster+ Terraform Provider
Announcing the Dagster+ Terraform Provider
Blog

April 28, 2026

Announcing the Dagster+ Terraform Provider

The Dagster+ Terraform provider lets platform teams manage deployments, access controls, alerting, and more as code. Define entire environments declaratively, review changes through pull requests, and integrate Dagster+ into your existing infrastructure workflows.

The Missing Half of the Enterprise Context Layer
The Missing Half of the Enterprise Context Layer
Blog

April 22, 2026

The Missing Half of the Enterprise Context Layer

AI agents that only understand business definitions without knowing whether the underlying pipeline actually succeeded are confidently wrong and operational context from the orchestrator is the missing piece.

How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind
How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind
Blog

April 20, 2026

How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind

Once your pipelines span multiple Databricks workspaces, you're no longer orchestrating a single system you're coordinating a distributed one.

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on How to Scale Data Teams
Guide

November 5, 2025

Download the eBook on How to Scale Data Teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the eBook Primer on How to Build Data Platforms
Guide

February 21, 2025

Download the eBook Primer on How to Build Data Platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.

AI Driven Data Engineering
Course

March 19, 2026

AI Driven Data Engineering

Learn how to build Dagster applications faster using AI-driven workflows. You'll use Dagster's AI tools and skills to scaffold pipelines, write quality code, and ship data products with confidence while still learning the fundamentals.

Dagster & ETL
Course

July 11, 2025

Dagster & ETL

Learn how to ingest data to power your assets. You’ll build custom pipelines and see how to use Embedded ETL and Dagster Components to build out your data platform.

Testing with Dagster
Course

April 21, 2025

Testing with Dagster

In this course, learn best practices for testing, including unit tests, mocks, integration tests and applying them to Dagster.