Using AWS with Dagster
Utilities for interfacing with AWS: S3, ECS, EMR, Cloudwatch, SecretsManager and Redshift.
About this integration
The Dagster-AWS integration allows you to seamlessly integrate key AWS services into data pipelines:
- S3 (File storage)
- ECS (Amazon Elastic Compute Cloud)
- Redshift (Data warehousing)
- EMR for petabyte-scale data processing (Easily run and scale Apache Spark, Hive, Presto, and other big data workloads)
- CloudWatch (Application and infrastructure monitoring)
- SecretsManager (Manage, retrieve, and rotate database credentials, API keys, and other secrets.)
Installation
pip install dagster-aws
Examples
# Store your software-defined assets in S3
from dagster import Definitions, asset
from dagster_aws.s3 import S3PickleIOManager, S3Resource
import pandas as pd
@asset
def asset1():
return pd.DataFrame()
@asset
def asset2(asset1):
return df[:5]
defs = Definitions(
assets=[asset1, asset2],
resources={
"io_manager": S3PickleIOManager(
s3_bucket="my-cool-bucket",
s3_prefix="my-cool-prefix"
s3_resource=S3Resource()
)}
)
About Amazon Web Services
AWS provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. Whether you're looking for compute power, database storage, content delivery, or other functionality, AWS has the services to help you build sophisticated applications with increased flexibility, scalability and reliability.