Monday, July 13, 2020

S3 – Simple Storage Service

S3 – Intro: Simple Storage Service
-         https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html
Concepts
-         Block storage – transactional database, random r/w, structured storage
-         Block storage - only keeps the index of where the data object is residing
-         Object Storage – keeps data files whole
          §  Metadata – object unique ID
          §  Suitable for distributed storage arch
          §   Cheaper h/w compared to block storage
          §  High availability and durability
          §  Cannot be mounted as a drive or directory to an EC2
-         Data consistency models
          §  Immediate Strong Constancy – write an object to multiple nodes; make sure that when read from multiple at the same time the return views are consistent
          §  Eventual Consistency – need to wait to make sure returns are the same
          §  Eventual Consistency - Holding Mechanism - precaution, disallow reads to a node while writes are in progress. If updating a file – the read will get either an old version or the new, not a mix


S3 – Object Based Storage
-         Read-after-write – immediate/strong consistency for HTTP PUTs of NEW files
-         PUTs and DELETEs of existing files – eventual consistency
-         Updates are Atomic – GET after PUT returns either the old or the new version
-         Files are S3-stored in multiple location w/in Regions (11 9's % availability)

Buckets
-         REGION specific
-         Object/bucket key = name (terminology)
-         No limit on # if objects, 0 bytes - 5TB each object
-         Can't have sub-folders or sub-buckets
-         Buckets are non-transferable
-         Can have up to 100 buckets/account; can request more form AWS
-         Bucket name – unique, 3-63 chars, numbers, hyphens; delete = release the name; all lowercase
-         3,500 PUT/GET/DELETE’s per sec per prefix in a bucket. Can have multiple prefixes!

Sub-resources
-         SDK / API
-         Default – bucket, objects in the bucket, all sub-resource are private

Bucket sub-resources
-         Lifecyle, website (URL), versioning, Access Control Lists (ACL), CORS, Policies (JSON), Logs
-         Bucket Policy
          §  A bucket policy is a resource-based AWS Identity and Access Management (IAM) policy. You add a bucket policy to a bucket to grant other AWS accounts or IAM users access permissions for the bucket and the objects in it
-         Bucket ACL – use this to permission another group (Log Delivery Group) to deposit logs into your S3 bucket
-         Versioning – once enabled, can't disable. Can suspend. Applies to ALL objects in bucket.
          §  Will protect existing and new objects
          §  Only owner can permanently delete objects
          §  Versioning ON – a delete operation places a Delete Market on object, but the object itself is kept. Can reconsider and remove Marker = restore.
          §  Can use Lifecycle policies to remove older version or move to Glacier
-         Versioning off / suspended
          §  all versions are NULL
          §  Overwrite an existing object – resets latest version to NULL.
          §  Delete an overwritten object – NULL version removed, all previous remain

-         MFA Delete 
          §  bucket's versioning configuration can be MFA Delete–enabled

Object sub-resources
-         Object ACL - Amazon S3 access control lists (ACLs) enable you to manage access to buckets and objects. Each bucket and object has an ACL attached to it as a sub-resource. It defines which AWS accounts or groups are granted access and the type of access. When a request is received against a resource, Amazon S3 checks the corresponding ACL to verify that the requester has the necessary access permissions. When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource
-         ACL is the only way to manage access to an object not owned by the bucket owner. Bucket owner can do three things to another user’s object deposited in his bucket:
          §  Deny access to object
          §  Delete object
          §  Archive object
-         Torrent – minimize number of GETs on bucket by letting one guy download and the rest do their GETs from him
-         Permissions
          §  Cross-account permissions – to read or write into a bucket
          §  Bucket – owned by account owner. Object – whoever deposited object into a bucket, owns the object – bucket owner does not have access to it by default. By default, bucket owner doesn't have permissions (even to read) except:
                 §  pays the bill for storage of the object
                 §  can deny access
                 §  can remove other's objects and yet
                 §  can archive/restore
                 §  can grant permissions users / accounts / everyone / only authenticated AWS users

AccessPolicies
-         Can associate w resource OR w a user – levels of granularity
-         If you’re more interested in “What can this user do in AWS?” then IAM policies are probably the way to go. You can easily answer this by looking up an IAM user and then examining their IAM policies to see what rights they have.
-         If you’re more interested in “Who can access this S3 bucket?” then S3 bucket policies will likely suit you better. You can easily answer this by looking up a bucket and examining the bucket policy.
-         Resource-based - attached to object / bucket:
          §  ACL – on bucket / object, list of grants – who can do what. Basic permissions only - read/write by another account or user
          §  Bucket policies – JSON Based; grant other accounts or IAM user permissions for a bucket OR objects inside the bucket. Applies only to acc owner's objects. 
          §  Cross-account – if you want to provide another account FULL S3 permissions, you can only do this via Bucket policy
          §  User-access policies:
                 §  To access a resource a user needs Permission from:
                 §  His parent account – user policy
                 §  Resource owner – bucket policy to the user OR bucket policy to the user’s account OR object ACL
                 §  Use IAM to manage access to S3 resource: IAM user/group/role or role + access policy
                 §  Cannot grant anonymous (public) permissions in an IAM policy – need to do this on user basis. Anonymous = to everyone.
                 §  Attach to user / group / role
                 §  Cross-account – grantee (receiver) can further restrict access to another account’s resource a subset of its users
-         To allow/deny access, S3 evaluates the following:
1.      user context (does user account have rights?)
2.      bucket context (user has access to perform operation on the bucket?).
-         Checks are:
                 §  Bucket policy (JSON)
                 §  Bucket ACL
                 §  Object ACL (if operating on an object)
-         https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
-         If the parent AWS account owns the resource (bucket / object), it can grant resource permissions to its IAM user by either the user policy OR the resource policy
-         If bucket and object owners differ – the object OWNER can use granular object ACL to allow/restrict

 

S3 Select
-         https://docs.aws.amazon.com/AmazonS3/latest/dev/selecting-content-from-objects.html
-         Amazon S3 feature that makes it easy to retrieve specific data from the contents of an object using simple SQL expressions without having to retrieve the entire object
-         With Amazon S3 Select, you can use simple structured query language (SQL) statements to filter the contents of Amazon S3 objects and retrieve just the subset of data that you need. By using Amazon S3 Select to filter this data, you can reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data

-         Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited



No comments:

Post a Comment