Sunday, July 12, 2020

Databases - Relational vs. Non-Relational


Relational Databases
-         Store of datasets, data access read/write is required on regular basis
-         Allows multiple user access
-         Safeguards against unintentional mistakes, power or h/w failures, can recover last known state
-         RELATIONAL
          §  data structure that allows to link info from different tables or diff types of data
          §  normalizes data into structures
          §  a schema is used to define tables/columns/indexes/table relationships
                   §  same items in tables stores in same table locations (rows/columns)
                   §  info about Entity can be saved data in multiple "RELATED" joined tables
          §  SQL is used
          §  Best suited for OLTP (on-line transaction processing) – transactional oriented apps
          §  MySQL is used in web application.
          §  Operational DBs: Oracle, DB2, SQL Server, MySQL
          §  Analytics: Oracle, AWS Redshift
-         Disadvantages:
          §  Inability to scale out – for web 2.0 or big data
          §  Require expensive hardware to scale up
                  §  Data Warehouse – collection of data from multiple sources which then undergoes complex long queries for analytical decisions and reporting

Non-Relational Databases
-         Stores data w/out structures mechanisms to link data from different tables to one another
-         High-performance, non-schema based
-         Non-structured or semi-structured data (JSON, XML)
-         Storage/retrieval is modeled w/out tabular relations
-         Scale horizontally using distributed clusters
-         Cheaper hardware
-         Faster performance – read/write
-         Easier to develop
-         Automatically spread data over multiple servers
-         Multiple related values/entries can be stored in a one DB entry unlike RDBMS
-         Best suited for On Line Analytical Processing (OLAP): sales/management reports, analytics
-         Analytics: AWS Elastic MAP Reduce (ENMR) -> Analytics (Hadoop Storage)
-         Operational: AWN DynamoDB, Cassandra, Redis (Open source NoSQL – fastest)
-         Database types:
          §  Columnar – optimized for read/write columns not rows
                  §  Reduces the amount of data to be loaded form disc
                  §  Scale out using distributed clusters
                  §  Reduce i/o requirements to read / write
                  §  Ex: Apache Cassandra, Apache HBase
          §  Document Database – store data as documents, JSON / XML
                  §  Schema for each doc can vary – more flexible to organize data, reduces need to store optional values elsewhere
                  §  Scale out using distributed clusters
                  §  Ex: DynamoDB – Used for ms latency read/writes: Mobile, Gaming, Ad-tech
          §  In-memory Key Value Store – used for read (not write) heavy app loads (social n/w, gaming, media sharing); compute-intensive workloads (recommendation engines)
                  §  Store critical pieces of data in memory
                  §  I/o intensive db queries
                  §  Results of computationally intensive calculations
                  §  On AWS can use your own or use ElastiCasche (it includes in-mem caching engines: Memcached, Redis)

No comments:

Post a Comment