Generated AI image by Microsoft Bing Image Creator
Introduction
If you’re like me, you’ve probably experienced the frustration of developing any AWS applications in the cloud especially such as lambda, cloudfront, S3, etc using tools like SAM, CloudFormation etc.. Now that I’ve been delving into the world of data engineering, AWS Glue jobs is one of the main AWS data engineering tooling offerings, the frustrations in having to push Glue jobs directly in the cloud—waiting for job runs, nitty gritty AWS Glue binaries setup, dealing with slow feedback loops, and watching your AWS bill creep up with every test iteration. I’ve been there, and it’s not fun.
That’s why I built this local development environment. After experimenting with different approaches, I’ve put together a Docker-based setup that lets me develop and test Glue jobs on my laptop before deploying to AWS. The best part? It includes all the real-world components I actually use in actual cloud envs: Kafka for streaming data, Iceberg for my data lake tables, and LocalStack to simulate AWS services.
In this guide, I’ll walk you through my setup step-by-step. Whether you’re working on batch ETL jobs or real-time streaming pipelines, this environment will save you time, money, and a lot of headaches.