Running Apache Airflow in a Docker Compose setup provides a reliable way to test configurations and develop workflows without touching the host system. This approach packages the scheduler, webserver, and metadata database into isolated containers while preserving the familiar folder structure for DAGs. For data teams, it bridges the gap between a simple local prototype and a full Kubernetes deployment.
Why Choose Docker Compose for Airflow
Docker Compose lowers the barrier to entry for new users who want to experiment with operators, sensors, and custom hooks. Instead of installing PostgreSQL, Redis, and multiple Python dependencies, you can launch a complete stack with a single command. The configuration is version-controlled alongside your DAGs, ensuring that the development environment matches staging or production more closely.
Standard Architecture Components
A typical stack includes the Airflow scheduler, the webserver for UI interactions, a PostgreSQL container for metadata storage, and Redis as the message broker and result backend. You can add extra services for logging or metrics, but the core four components handle most use cases. Volume mounts ensure that DAG files and configuration changes appear instantly inside the scheduler and webserver containers.
Essential Files and Folder Structure
docker-compose.yaml defining services, networks, and volumes.
airflow.cfg tuned for local storage and SQLite or PostgreSQL connections.
A dags/ directory mounted into the scheduler and webserver to sync DAG definitions.
Environment variables or an .env file to set passwords and image tags securely.
Step-by-Step Setup and Best Practices
Start by choosing an official Airflow image and specifying the executor, usually LocalExecutor for simplicity. Define environment variables such as _AIRFLOW_WWW_USER to create an admin account automatically. Use named volumes for the database to persist data across container restarts and rebuilds.
Network and Security Considerations
Isolate the stack on a custom network so that services discover each other via internal DNS. Avoid exposing the database port to your host unless necessary, and restrict port 8080 to localhost during development. Rotate the secret key and database passwords before promoting the setup to any shared or cloud environment.
Scaling and Performance Tips
You can increase parallelism by adding more scheduler or worker containers, but you must ensure the metadata database and Redis instance can handle the load. Monitor memory usage within the containers and adjust parallelism and dag_concurrency in airflow.cfg accordingly. For more demanding workloads, consider moving to CeleryExecutor with a robust broker configuration.
Troubleshooting Common Issues
Permission errors on the mounted dags/ folder often occur when user IDs inside the container do not match the host. Use user_id and group_id settings in the compose file or adjust folder ownership on the host. If the webserver fails to connect to the database, verify that the PostgreSQL container is healthy and that the connection URL in airflow.cfg uses the correct service name as the hostname.
Next Steps Beyond Development
Once the compose-based workflow is stable, you can generate deployment scripts that replicate the same container images in a staging or production environment. Integrate the setup with CI/CD pipelines to run tests in an ephemeral environment before merging DAG changes. From here, evaluating managed platforms or a Kubernetes-based deployment becomes a matter of scaling and operational overhead rather than a complete re-architecture.