Relational Database Management System (RDBMS) platforms store data in a normalized relational structure. This structure reduces hierarchical data structures and keeps data across multiple tables. You can often query the data from multiple tables, and assemble at presentation layer. Though, that won’t be efficient for ultra-low latency workload. To support high-traffic queries with ultra-low latency, taking advantage of a NoSQL system generally makes technical and economic sense.
To start designing a target data model in Amazon DynamoDB that will scale efficiently, you must identify the common access patterns. For IMDb use case we have identified a set of access patterns as described below:
A common approach to DynamoDB schema design is to identify application layer entities and use denormalization and composite key aggregation to reduce query complexity. In DynamoDB, this means using composite sort keys, overloaded global secondary indexes, partitioned tables/indexes, and other design patterns. In this scenario, we will follow Adjacency List Design Pattern, which is a common way to represent relational data structures in Amazon DynamoDB. The advantages of this pattern includes optimal data duplication and simplified query patterns to find all metadata related to each movie. The partition key in this model is tconst (unique movie id) and sort key is overloaded to define item type in the collection. Following prefix is used to indentify record type in the collection:
A new GSI is created on the movies table with new partion key: nconst (unique per movie’s crew) and sort key: movie start year. This will help to query access pattern by crew member (#6 inside the common access pattern table)
Below small video demonstrates how all of these access pattern are evaluated against target DynamoDB model.