Friday, July 17, 2020

DynamoDB

DynamoDB
-         Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability
-         Unstructured or semi-structured data
-         Global tables
-         Runs on SSD volumes – high performance + I/O
-         Distributed across multiple AZs for resilience
-         Eventual read consistency by default – best read throughput; consistency reached w/in 1 sec
-         Strong consistency – supported; performance is not same as at Eventual
-         User application can specify which consistency mode is required for reading: strong, eventual or both
-         Tables – collections of data items
-         Itemsprimary key or composite key + group of attributes;
          §  400 KB max. If need more – store attributes as objects in S3; have pointers to the objects in as attributes
          §  No ceiling on number of items allowed (equivalent of <row>)
-         Attribute – name / value (equivalent of <column>)
-         Use case: can be used for storing application (or web) session state while providing a shared data store with durability, persistence, and low latency
 
Throughput
-         Read capacity unit - represents 1 strongly consistent read per second, or 2 eventually consistent reads per second, for an item up to 4 KB in size.
          §  @ Strong consistency – per 1 sec – 1 read - 4KB
          §  @ Eventual consistency – per 1 sec – 2 reads - 8KB
          §  No fractional unit sizes; i.e.: read 5KB = 2 capacity units
-         Write capacity unit:
          §  1 write per second for an item of 1KB
-         Pay for
          §  Provision read/write capacity per hour - Writes are more expensive than reads! DynamoDB is an efficient solution for more read-intensive apps
          §  Index storage
          §  Internet data transfer across regions
-         DynamoDB Streams
          §  Old and new images of items; kept for 24 hours
-         Global Tables
          §  Replicated across regions – need to have streams enabled
 
Indexes
-         Choosing the Right DynamoDB Partition Key
-         Partition key - used for partitioning the data. Data with the same partition key is stored together, which allows you to query data with the same partition key in 1 query. The (optional) sort key determines the order of how data with the same partition key is stored.
-         Local secondary index – partition key of the index is the same as of the table. Sort key can differ
-         Global secondary key – primary key (partition and sort key) can be any two fields in the table
          §  Data is asynchronously copied out and stored separately from the table
          §  Takes reads and writes to create indexes – cost associated
          §  When an index is created – some or part of the data is copied (projected) from the table into the index. You can specify the attributes you want copied from the base table

DynamoDB Local
-         Can download a copy of the db locally (ex: via Docker image), make changes and upload

Save on data transfer and read/write charges
Backups
-         Can create backups – REGIONAL; all Indexes get included
-         Pay for backup storage only
-         Point-In-Time Recovery available from a backup; need to be explicitly enabled on a table
-         Restore dumps data into NEW table, not the old/original table
-         Can restore into another Region

TTL
-         https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html
-         Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any write throughput. TTL is provided at no extra cost as a means to reduce stored data volumes by retaining only the items that remain current for your workload’s needs.
-         The scanner background process compares the current time, in Unix epoch time format in seconds, to the value stored in the user-defined attribute of an item. If the attribute is a Number data type, the attribute’s value is a timestamp in Unix epoch time format in seconds, is older than the current time, but not five years or older, then the item is set to expired.
-         Timestamps comparison is done in EPOCH format (https://www.epochconverter.com/)
-         Takes 48 hours to remove expired records; meanwhile these do show up in query responses
 

DAX – Accelerator
-         https://aws.amazon.com/dynamodb/dax/
-         fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second
-         to be used for eventual consistency reads only – not for strong consistency
-         by default - hosted in the default VPC
-         DAX Client is deployed on EC2 in VPC
-         DAX Cluster – up to 10 nodes; min 3 recommended by AWS for resilience
-         Security group with TCP on port 8111 is needed for the cluster
-         Encryption at rest is available
-         DAX Client sends request for data to the cluster
          §  Cache hit – data is found in cache, response provided
          §  Cache miss – data not found in cache, DynamoDB is queried; response data is stored on the primary node in the cluster and replicated to other nodes; response to DAX client is provided
          §  Data TTL – 5 min
-         Use cases: Micro-second read time for eventually consistent data; read intense yet cost sensitive apps
 
DB Streams
-         https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
-         DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time
-         Changes on item level, before and after, stored for 24 hours
-         Asynchronous
-         A dedicated endpoint is created by AWS for Streams: streams.<DynamoDB-endpoint> - srteams.dynamodb.<region>.amazonaws.com
-         Disable a stream – data remains in place and readable for up to 24 hours

DB Transactions
-         API that allows for grouping of multiple actions together and submitting them as a single all-or-nothing TransactWriteItems or TransactGetItems operation.
-         More expensive – each item action via a transaction is 2 Read or Write Capacity Units: one to prepare the transaction and another to commit
-         Can combine up to 25 action in one transaction – all on same account/REGION
-         Max transaction size is 4 MB
-         Success of all or none; one change fails – do not apply any
          §  Put
          §  Update
          §  Delete
          §  ConditionCheck – check if item exists or check a specific attribute of an item
-         Applied changes are propagated to all GSIs (Global Secondary Indexes). LSIs are updated automatically as these contain the actual table data

-         Push button scaling
-         Can increase provisioned capacity any time
-         Can scale down once provisioned capacity 4 times during a calendar day (UTC)
          §  To maximize: four times in the first hour, then if no further adjustments are done – gain 1 scaling down opportunity every 4 hours. Can score up to 9 scale-down operations in total
-         No limit on data that can be stored in a table
-         10,000 reads or writes per sec per table – max. If more is needed – contact AWS.
-         More than 10K will be throttled
-         Reads for an index can be throttled as well to prevent an over-consumption of capacity units
-         Throttling rejects – HTTP 400 – Bad Request
-         Max 256 tables per account per region
-         Read capacity units, varies per region
-         North Virginia region
          §  per table - 40,000 Capacity Units for read and 40,000 CUs for write
          §  per account - 80,000 CUs for read and 80,000 CUs for write
-         All other regions
          §  per table - 10,000 CUs for read and 10,000 CUs for write
          §  per account - 20,000 CUs for read and 20,000 CUs for write

DynamoDB Kinesis Adapter
-        Using the Amazon Kinesis Adapter is the recommended way to consume streams from Amazon DynamoDB. The DynamoDB Streams API is intentionally similar to that of Kinesis Data Streams, a service for real-time processing of streaming data at massive scale

DynamoDB best practice
-         Keep item size under 400KB
-         Separate most frequently accessed data into separate tables
-         Store larger items in S3 and save object pointers in a table
-         Make primary key value part of S3 object metadata

Use Amazon DynamoDB Accelerator (DAX) from AWS Lambda to increase ...

No comments:

Post a Comment