Friday, December 21, 2018

JTD-DesignerSeries-17-MongoDB-101


A brief Context
Relational Databases with its three decades of legacy, have provided robust monolith applications. MongoDB on the other hand, has created no-sql DB engine that allow you to work with your data anyway you want, scale in & scale out with your workloads and run it anywhere by deploying on a single node or on different clouds. MongoDB is a document based DB engine that provides drivers in languages like Python, Javascript, Ruby, Java and allows you to quickly build modern applications.

Architecture

mongod is a daemon, a core runtime process of MongoDB designed with architectural layers of MQL, Document Data Model and Storage Layer, that accepts connection requests and persist data to the hard drives.

MongoDB Query Language (MQL) - Data & operations send by client drivers in the form of BSON messages are translated to mongoDB instructions by this layer. It provides different operators & aggregation engine to support CRUD activities.

MongoDB Document Data Model - This layer is responsible for applying the CRUD operations in a distributed replicas & its data structures by handling requirements related to replication, durability.

Storage Layer - This layer handles the disk level system calls on a single node while also providing things like compression, encryption. Wired Tiger is a default storage engine for MongoDB.

Security & Admin are traversal layers for user management & database admin activities.

MongoDB is a distributed DBMS and supports high availability & automatic failover with a replication mechanism between primary & secondary nodes in a replica set. MongoDB manages scalability needs with mongos managing shards of replica sets.


Documents & Data Structures

Data in mongoDB is stored in a hierarchical structure with database at the top level with one of more collections. Then there are documents in the collections and multiple documents represent your data set. MongoDB stores JSON documents as binary representation of JSON (BSON). Drivers for different languages returns documents in JSON format and responsible for translation from BSON to JSON format.

CRUD Operations
a) Create new documents in a collection.
db.users.insertOne({name:"sue", age: 26, status: "pending"})

b) Retrieve documents in a collection with query filters.
db.users.find({age: {$gt: 18}})

c) Modify existing documents in a collection.
db.users.updateMany({age: {$lt: 18}}, {$set: { status: "reject"}})

d) Remove documents from a collection.
db.users.deleteMany({status: "reject"})


Aggregation Framework
Modeled on the concept of data processing pipelines, pipelines are composition of several stages that operate on documents and filter & transforms them into aggregated results.

Syntax:
db.userColl.aggregate([{stage1}, {stage2}, {stage3}], {options})


Replication
Replication provides redundancy and increases data availability.With multiple copies of data on a different server, replication provides level of fault tolerance against the loss of single database server. A replica set in MongoDB is a group of mongod processes that maintain the same data set. MongoDB uses statement based replication mechanism, a process by which DB operations are translated to idempotent op log entries in the primary node which are then used by secondary for replicating data.

Sharding
Database growth can be supported by either vertical scaling or horizontal scaling. Vertical scaling resorts to approach of increasing CPU on a single node, whereas horizontal scaling is an approach of splitting data across multiple nodes.
Sharding is a method of dividing data across multiple replicas based on a shard key. MongoDB implements sharding by routing queries through the mongos and storing metadata and configuration settings in Config Servers.
It is quite important to pick a good shard key based on the cardinality [High], frequency [Low], monotonic change [Avoid]. In some cases, it may be beneficial to use the hashed shard key but then you loose the ability to target a range query and mongos will always perform scatter gather query.

Storage
Storage Engine is a component that manages how data is stored to the disk. Wired Tiger is the default storage engine for MongoDB but there are other options like In-Memory, MMAPv1. WiredTiger supports atomicity at a document level during the write operations and creates checkpoint after writing snapshot data to the disk. MongoDB can recover from a last checkpoint in case of failure writing the new checkpoint and can refer to journal to replay all the data between the checkpoints. With WiredTiger MongoDB supports compression for all collections and indexes.

No comments:

Post a Comment