Aleks on AWS: CloudWatch

CloudWatch

- https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html
-         Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services that run on AWS and on-premises servers
-         Repository of CW metrics (out-of-the-box / customer metrics) + Alarms
-         Metrics – time-ordered set of datapoints, measurable variable of an application, delivery via VPC Endpoint
-         You can add datapoints in any order at any rate you choose
-         Access: Console, API, CLI, SDK
-         Namespace – container for metrics; isolated from each other, AWS/<name>

Metrics
- REGIONAL

-         Identified by: name, namespace, 0 or more dimensions. Plus timestamp + unit of measure
-         Timestamp – 2 weeks in the past or 2 hours into future
-         Metrics can’t be deleted – but expire in 15 months if no data is published

- The minimum resolution supported by CloudWatch is 1-second data points, which is a high-resolution metric, or you can store metrics at 1-minute granularity

- If you do not specify that a metric is high resolution, by setting the StorageResolution field in the PutMetricData API request, then by default CloudWatch will aggregate and store the metrics at 1-minute resolution

- https://aws.amazon.com/cloudwatch/faqs/#:~:text=Q%3A%20What%20is%20the%20minimum,metrics%20at%201%2Dminute%20granularity.

- Data points with period of

      § < 60 sec – available for 3 hours
      § = 60 sec – available for 15 days
      § = 300 sec – 63 days
      § = 3600 sec (1 hour) – 455 days (15 months)
      § Older than 15 months – removed on rolling basis
      § Granular Data points get aggregated into lower resolution metrics after the expiration period – meaning the you get to see 15 days worth of 60 sec interval data which then switches to 300 sec interval until the 63 day mark, etc.

Alarms
-         Watch singe metric or Math expression over multiple metrics
- Can create filter for logs to search for metrics
-         States: OK, ALARM, INSUFFUCUENT_DATA
-         Settings when creating an alarm:
      § Period (sec) – how often to evaluate metric (ex: 1 datapoint every minute)
      § Evaluation period – number of recent periods to consider (ex: 5 min – 5 data points)
      § Datapoints to alarm – how many non-consecutive breaches over Eval Period trigger the alarm
-         High resolution alarms – off high-resolution metrics (< every 60 sec) – more money
-         Alarm Actions are very limited (compared to Events):

§ EC2 (reboot, recover, start etc.)

§ EC2 Scaling

      § SNS
-         CAN’T invoke a Lambda function nor SQS
-         After an Action due to change in state is invoked, subsequent behavior depends:
      § EC2 AutoScaling action is invoked for every period Alarm is active
      § SNS – no additional notifications invoked

Logs
-         Uses your AWS and non-AWS data logs for monitoring
-         Can create custom queries: EC2, CloudTrail, Route 53 DNS Logs, RDS, Neptune, VPC Flow, Elastic Bean Stalk, API G/way, Lambda, etc.
-         LogEvent – recorded by application monitored. Timestamp + raw event message
-         Log Streams – sequence of events that share the same source (empty ones deleted after 2 months)
-         Log Groups – group of log streams (ex: VPC ENI Flow Logs); stream have to possess same retention, monitoring, access control settings. Streams must belong to Groups. No limit on how many per Group. Ex: group all my Apache logs into a single group
-         Log Retention – logs don’t expire by default; configure retention settings to avoid charges – at the Log Group level. 1 day to 10 years.
-         Metric Filters – custom. Used to extract metric observations (Failed Logins, etc.) – transform into data points in CloudWatch metric. CloudWatch Logs sends these metrics to CloudWatch every 1 min.
-         Encrypting - log data in transit on the way into CloudWatch and at rest can be encrypted. CW will manage the keys (can use KMS CMK, need permission to access CMK). Encryption at Log Group level.
At rest – can turn on encryption only via CLI, will apply to newly received data.

CloudWatch Insights
- https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html

- Use query language to search logs
-         Raw log data needs to be JSON
-         One query request can search up to 20 log groups from EC2, Route 53, Lambda, CloudTrail, VPC or a custom app
-         Log data needs to be older than Nov 5 2018
-         Ex: search for errors in EC2 logs and feed into CloudWatch metric and then trigger an alarm
-         Log data can be sent to S3 – can use same bucket for multiple log groups
-         Exporting log data to S3 can take up to 12 hours. Insights – near real time analysis
-         Can near-real-time it into ElasticSearch – costs associated
-         Example:
      § CloudWatch – create a Log group
      § AIM – create an IAM role that has permission to publish to the Amazon CloudWatch Logs log group
      § EC2 ENI – create LogFlow into the CW Log Group
      § On CW Log Group – filter by an IP, create custom metric for it
      § Off the metric - Create an Alarm to send SNS notifications

Unified CloudWatch Agent
- https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

- An application. Can be installed on premise or on EC2 running either Linux or Windows Server – on instance/server to collect logs from

-         Sends logs and advanced metrics data off the host into the service
Real-Time log processing:
-         Needed for Lambda, Kinesis Firehose, Kinesis Streams
      § This is an alternative for sending logs to S3 (takes time) and reading from a bucket
      § Create a destination for one of these on CW Log Group, IAM role
      § Create a filter
      § Can do this across accounts – but only for Kinesis Data Streams
      § Log Group and the destination must be in same region, but the resource that the destination points to can be in different region

Events
-         Can do much more w Event than w Alarms
-         Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources
-         Create rules to invoke Targets based on Events happening in your AWS environment:
      § Event Source | Target
      § Event Pattern: Service Name, Event Type
      § Schedule: Fixed rate of # / # min or Cron
-         Can do cross-account streaming of event; sender pays

- Targets: SNS, Lambda, EC2 API Call

Aleks on AWS

Monday, July 13, 2020

CloudWatch

No comments:

Post a Comment