Sunday, July 19, 2020

Kinesis


Kinesis
-         https://docs.aws.amazon.com/kinesis/index.html
-         Data streaming and short-term storage – consume data in small chunks from numerous sources; make it available for consumption downstream fast. No data transformation.
-         Managed streaming data service – TB’s per hour from 100,000s sources:
          §  Kinesis Streams
          §  Kinesis Firehose
          §  Kinesis Analytics
-         Use cases: sensor data, online gaming, social media posts; markets

Kinesis Stream
-         accepting huge amount of data from producers and making it available in real-time to consuming applications
-         https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html
-         For each Amazon Kinesis Data Streams application, the KCL uses a unique Amazon DynamoDB table to keep track of the application's state. Because the KCL uses the name of the Amazon Kinesis Data Streams application to create the name of the table, each application name must be unique.
-         shard is a uniquely identified sequence of data records in a stream
-         Each shard can support:
          §  up to 5 transactions per second for reads
          §  up to a maximum total data read rate of 2 MB per second
          §  up to 1,000 put records per second for writes (record size affects the rate)
          §  up to a maximum total data write rate of 1 MB per second (including partition keys).
-         Retention period – default of 24 hours; up to 7 days – chargeable
-         Encryption (server-side, within the stream) available by KMS – producer and consumer must have access to the master key. Chargeable service.
-         Fully managed service – takes care of scaling, networking, etc.
-         Data replicate synchronously to 3 AZ’s for resilience
-         Can have output of one stream be input of another stream

Kinesis Firehose
-         https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
-         fully managed service for delivering real-time streaming data to destinations such as
          §  Amazon Simple Storage Service (Amazon S3)
          §  Amazon Redshift
          §  Amazon Elasticsearch Service (Amazon ES)
          §  Splunk
-         Can transform data before delivering to S3, etc.
-         Kinesis Streams can be used as data source
-         Streams – need to write applications to read data from a stream
-         Firehose – no need to write code to get data into S3 etc.
-         Can compress and encrypt data before delivering
-         If data encryption is requiredStreams need to be the delivery mechanism; Streams will perform the encryption providing Firehose to the master key on KMS
-         Data replicate synchronously to 3 AZ’s for resilience
-         Can buffer up the data for up to 24 hours in case receiving application is disconnected
-         Can invoke a Lambda function for data transformation before delivery
-         If the destination is Redshift – data is first delivered into S3; Firehose then copies it form S3 to a Redshift cluster

Kinesis Analytics
-         https://docs.aws.amazon.com/kinesisanalytics/latest/dev/what-is.html
-         With Amazon Kinesis Data Analytics for SQL Applications, you can process and analyze streaming data using standard SQL. The service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics.
-         Can have Stream or Firehose as source or as destination
-         Need IAM roles set up to read from streaming sources and wrote to destinations

Diagram showing a data analytics application, streaming input sources, reference
                data, and application output.

Streams vs. Firehose
Kinesis Data Streams vs. Firehose
-         Source: https://jayendrapatil.com/aws-kinesis-data-streams-vs-kinesis-firehose/




No comments:

Post a Comment