Aleks on AWS: Databases - Relational vs. Non-Relational

Relational Databases
-         Store of datasets, data access read/write is required on regular basis
-         Allows multiple user access
-         Safeguards against unintentional mistakes, power or h/w failures, can recover last known state
-         RELATIONAL
      §  data structure that allows to link info from different tables or diff types of data
      §  normalizes data into structures
      §  a schema is used to define tables/columns/indexes/table relationships
    §  same items in tables stores in same table locations (rows/columns)
    §  info about Entity can be saved data in multiple "RELATED" joined tables
      §  SQL is used
      §  Best suited for OLTP (on-line transaction processing) – transactional oriented apps
      §  MySQL is used in web application.
      §  Operational DBs: Oracle, DB2, SQL Server, MySQL
      §  Analytics: Oracle, AWS Redshift
-         Disadvantages:
      §  Inability to scale out – for web 2.0 or big data
      §  Require expensive hardware to scale up
      §  Data Warehouse – collection of data from multiple sources which then undergoes complex long queries for analytical decisions and reporting

Non-Relational Databases
-         Stores data w/out structures mechanisms to link data from different tables to one another
-         High-performance, non-schema based
-         Non-structured or semi-structured data (JSON, XML)
-         Storage/retrieval is modeled w/out tabular relations
-         Scale horizontally using distributed clusters
-         Cheaper hardware
-         Faster performance – read/write
-         Easier to develop
-         Automatically spread data over multiple servers
-         Multiple related values/entries can be stored in a one DB entry unlike RDBMS
-         Best suited for On Line Analytical Processing (OLAP): sales/management reports, analytics
-         Analytics: AWS Elastic MAP Reduce (ENMR) -> Analytics (Hadoop Storage)
-         Operational: AWN DynamoDB, Cassandra, Redis (Open source NoSQL – fastest)
-         Database types:
      §  Columnar – optimized for read/write columns not rows
      §  Reduces the amount of data to be loaded form disc
      §  Scale out using distributed clusters
      §  Reduce i/o requirements to read / write
      §  Ex: Apache Cassandra, Apache HBase
      §  Document Database – store data as documents, JSON / XML
      §  Schema for each doc can vary – more flexible to organize data, reduces need to store optional values elsewhere
      §  Scale out using distributed clusters
      §  Ex: DynamoDB – Used for ms latency read/writes: Mobile, Gaming, Ad-tech
      §  In-memory Key Value Store – used for read (not write) heavy app loads (social n/w, gaming, media sharing); compute-intensive workloads (recommendation engines)
      §  Store critical pieces of data in memory
      §  I/o intensive db queries
      §  Results of computationally intensive calculations

§ On AWS can use your own or use ElastiCasche (it includes in-mem caching engines: Memcached, Redis)

Aleks on AWS

Sunday, July 12, 2020

Databases - Relational vs. Non-Relational

No comments:

Post a Comment