In this workshop you will be presented with a serverless event-driven data aggregation pipeline. It’s built with AWS Lambda, Amazon DynamoDB, and Amazon Kinesis Data Streams.
While you may be happy to partake in this workshop, be aware that the pipeline is broken and it needs your attention to get running! As you will soon discover, the workshop is designed with an input stream that contains duplicates and Lambda functions that randomly fail.
Over the course of two labs you will have to first connect all the elements of the pipeline, and then update the Lambda functions to avoid message loss or duplication under (induced) random failures.
Here’s what this workshop includes:
Setup the lab environment.
Lab goals
Process streaming data to create an end-to-end data processing pipeline
Query a sharded global secondary index to quickly read sorted data by status code and date.
Explore how to maintain the ability to query on many attributes when you have a multi-entity table.
The workshop is intended for anyone interested in understanding how to build serverless data processing pipelines. Basic understanding of AWS services and experience in Python programming is desirable but not required. We classify this as a 300-level workshop, which means you don’t need to be an expert on any of the three services that are the focus.
The workshop requires approximately 2 hours to complete.