About this integration
This resource provides access to a PySpark SparkSession for executing PySpark code within Dagster.
Installation
pip install dagster-pyspark
Example
See the with_pyspark_emr
example project.
About PySpark
PySpark is the Python API for Apache Spark, a distributed framework and set of libraries for real-time, large-scale data processing. PySpark allows you to create more scalable analyses and data pipelines.