SlideShare a Scribd company logo
Evolution of MongoDB Replica Set
and Its Best Practices
Manosh Malai
CTO, Mydbops
28Th August 2021
Mydbops 8th Webinar
Interested in Open Source technologies
Interested in MongoDB, DevOps & DevOpSec Practices
Tech Speaker/Blogger
CTO, Mydbops IT Solution
Manosh Malai
About Me
Consulting
Services
Managed
Services
Focuses on MySQL, MongoDB and PostgreSQL
Mydbops Services
500 + Clients In 5 Yrs. of Operations
Our Clients
MongoDB Evolution and Its Exciting Features 4.0 to 5.0
MongoDB Replica Set Implementation Best Practices
Introduction
Agenda
INTRODUCTION
Scaling MongoDB
MongoDB is designed to effectienly handle large dataset through vertical and horizontal scaling
Additional node to share the load, MongoDB achieved primarily through Sharding
Vertical scaling refers to the use of CPU, RAM, and I/O to increase the processing capability of a single
server or cluster(Replica Set).
▪ Is MongoDB fit for large data
▪ Horizontal Scaling
▪ Vertical Scaling
Vertical and Horizontal Scaling
Vertical Scaling
Horizontal Scaling
MONGODB REPLICATION AND IT'S BEST PRACTICES
Asynchronous Replication
WRITE
PRIMARY
REPL-WRITE
REPL-WRITE
SECONDARY
SECONDARY
Automatic Failover
PRIMARY
SECONDARY
SECONDARY
Heartbeat
SECONDARY
Heartbeat
PRIMARY
REPL
electionTimeoutMillis 10 Sec(Default) + 2 sec to select New Primary = 12 Sec Median to elect New Primary
Odd Number Of Replica Member
50 Members Max 7 Members Only can Vote Members count always Odd
Cont...
Number of Voting Member Majority Numberof Tolerable Failure
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
Even Node:
(N/2)+1 = Majority Node
Odd Node:
(N+1)/2 = Majority Node
6/2+1 = 4
7+1/2 = 4
Number Of Members Alive Majority = Cannot Elect Primary And Write Fail
READ And WRITE Replica Settings
Read Preference
• Primary
• Primary Preferred
• Secondary
• Secondary Preferred
• Nearest
Write Preference
• w: majority
• w: <N>
• j: true
Secondary Member Type
Type Read Accept Vote Become Primary
Priority 0 Yes Yes No
Hidden No Yes No
Delay No Yes No
Arbiter No Yes No
Replica Set Best Practices
db.collection.find().readPref('nearest', [ { 'dc': 'east' } ])
▪ Use hostnames when configuring replica set members rather than IP-addresses
▪ Ensure that the replica set has an odd number of voting members
▪ Oplog Recovery Window need to maintain minimum 24 hours
▪ 3 type of connection URI
▪ Consistency Read: primary
▪ Eventually Consistent: SecondaryPreferred, maxStalnessSeconds
▪ write Concern w: 1
▪ Nearest read preference , tag set and maxStalnessSeconds read setting need use in Geographically
Distributed Members
Replica Set Best Practices - 2
▪ Use x.509 Certificate for Membership Authentication
security:
clusterAuthMode: x509
net:
tls:
mode: requireTLS
certificateKeyFile: <path to its TLS/SSL certificate and key file>
CAFile: <path to root CA PEM file to verify received certificate>
clusterFile: <path to its certificate key file for membership authentication>
bindIp: localhost,<hostname(s)|ip address(es)>
Replica Set Best Practices - 3
• Enable Authorization
• Create different role for Database Administration, Operation and Admin
OPS User DBA User Super User
List Database (show dbs) List Database (show dbs) ALL ACCESS(root)
List collections (show collections) except
admin,local,config database.
List collections (show collections) except
admin,local,config database.
Read collection data (db.coll.find()) Read collection data (db.coll.find())
Able to check collection stats (db.coll.stats()) Able to check collection stats (db.coll.stats())
Able to check db stats (db.stats()) Able to check db stats (db.stats())
Able to create Index Able to create Index
Able to see the current running queries (db.currentOp()) Able to see the current running queries (db.currentOp())
Able to kill the queries Able to kill the queries
Able to see the replication status Able to see the replication status
Able to see the list of users Able to see the list of users
Able to see the inherited privileges of each role Able to see the inherited privileges of each role
Able to rotate the log file Able to rotate the log file
Able to drop Index
Able to shutdown mongo
Able to Lock writes
Able to configure the replica set
Able to change the replica set IP
Able to run compaction against collection
Replica Set Best Practices - 4
▪ Mongod services should run in a non-privileged account with nologin/false shell.
▪ DO NOT Allow MongoDB to talk to the internet at all costs
▪ Configure security groups to block outbound connections to internet(Network Level)
▪ Configure IPTABLES/UFW to block/control outbound traffic(Instance Level)
▪ use the XFS filesystem
▪ Turn off atime for the storage volume with the database files
▪ <MongoDB Data Partition> xfs rw,noatime,attr2,inode64,noquota 0 0
▪ Do not use huge pages virtual memory pages, MongoDB performs better with normal virtual memory pages.
▪ $ echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
▪ $ echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
Replica Set Best Practices - 5
▪ Disable NUMA in your BIOS or invoke mongod with NUMA disabled.
▪ Edit /etc/systemd/system/multi-user.target.wants/mongod.service
▪ ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf
▪ Ensure that readahead settings for the block devices that store the database files are relatively small as most
access is non-sequential. For example, setting readahead to 32 (16KB) is a good starting point.
▪ ulimit to apply these settings:
-f(filesize):unlimited
-t(cputime):unlimited
-v(virtualmemory):unlimited • -n(openfiles):64000
-m(memorysize):unlimited
-u(processes/threads):32000
Replica Set Best Practices - 6
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_probes = 6
• Network Stack Tuning
• Dirty Ratio
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
MONGODB EVOLUTION AND ITS RECENT EXCITING
FEATURES
Evolution of MongoDB Replica Set
Resumable Initial Sync - From MongoDB 4.4
PRIMARY SECONDARY
▪ Initial Sync can attempt to resume the sync process if interrupted by a
• network error
• collection drop
• collection rename
• The Secondary tries to resume initial sync for 24 hours (Default)
• db.adminCommand( { setParameter: 1, initialSyncTransientErrorRetryPeriodSeconds: <value> } )
Resumable Initial Sync Monitoring - From MongoDB 4.4
Streaming Replication - From MongoDB 4.4
Streaming Replication - From MongoDB 4.4
Before 4.4:
• Single OplogFetcher thread actively send getMore command to the Primary Oplog Collection
• If there is data, a batch of upto 16MB is returned
• Each batch acquisition needs to go through a complete network RTT
• In the case of a bad replica set network, the performance of replica is severely limited by network
latency
From 4.4:
• Incremental Oplog is constantly flowing into the secondary node, instead
of relying on the active poll by the Secondary node
• Compared with the previous method, at least half of the RTT is saved in
the Oplog sync process.
• The majority write performance increases by 50% on average
db.adminCommand( { setParameter: 1, initialSyncTransientErrorRetryPeriodSeconds: <value> } ) True/False
Minimum Oplog Retention Period - From MongoDB 4.4
• { replSetResizeOplog: <boolean>, size: <num MB> }
• db.adminCommand({replSetResizeOplog:1, size: 16384})
• db.getSiblingDB("local").oplog.rs.stats(1024*1024).maxSize
3.6
• db.adminCommand({replSetResizeOplog: <int>, size: <double>, minRetentionHours:
<double>})
• db.adminCommand({replSetResizeOplog: 1, size: 20000, minRetentionHours: 1.5})
• db.getSiblingDB("local").oplog.rs.stats(1024*1024).maxSize
• db.getSiblingDB("admin").serverStatus().oplogTruncation.minRetentionHours
4.4
Minimum Oplog Retention Period - From MongoDB 4.4
• In a longer retention time configured scenario, Because of a combination of high write volume, The
oplog may grow beyond its maximum size to keep the Oplog entires.
• From MongoDB 4.0 onward, MongoDB forbids you from dropping the local.oplog.rs collection
• We can specify a size of 990 megabytes to 1 petabyte.
• Reducing the oplog size does not automatically reclaim disk space. Compact must be performed on
the local database's oplog.rs collection.
Mirrored Reads - From MongoDB 4.4
• The Primary node will copy the read traffic to the one secondary node at a certain ratio
• This helps to warm up the secondary node cache that is very similar to the Primary server cache
• When primary server node went down, the mirrored Secondary node take responsibility and serve
the traffic
• This feature helps reduce the "Cache Miss" and disk load. And it keeps the same query performance
all along as the previous primary.
• The mirrored reads are "fire-and-forget" operations by the primary; i.e., the primary does not await
the response for the mirrored reads.
• Electable <members[n].priority> secondary replica set member receive mirrored read
• A sampling rate of "0.0" disables mirrored reads.
• A sampling rate of a number between "0.0" and "1.0"
• sampling rate of "1.0" results in the primary forwarding all
Mirrored Reads - From MongoDB 4.4
• db.adminCommand( { setParameter: 1, mirrorReads: { samplingRate: 0.10 } } )
• db.runCommand( { serverStatus: 1, mirroredReads: 1 } )
• Mirrored reads support the following operations:
• Count
• Distinct
• Find
• findAndModify (Specifically, the filter is sent as a mirrored read)
• update (Specifically, the filter is sent as a mirrored read)
Simultaneous Indexing - From MongoDB 4.4
• Before version 4.4, the index creation must be copied to the Secondary node to run once the primary node is
complete
• From 4.4, Indexes Build Simultaneously on Data-Bearing Replica Set Members
• Index build process
"startIndexBuild" oplog
entry
commitIndexBuild
abortIndexBuild
Primary check for Quorum Vote and any
key constraint violations
CreateIndex Command
Each Member Vote
commit for its
finished index
Secondary
"startIndexBuild"
Simultaneous Indexing - From MongoDB 4.4
Index Creation Command:
db.getSiblingDB("examples").invoices.createIndexes(
[
{ "invoices" : 1 },
{ "fulfillmentStatus" : 1 }
]
)
Setting Index Commit Quorum:
db.getSiblingDB("examples").runCommand(
{
"setIndexCommitQuorum" : "invoices",
"indexNames" : ["invoices_1", "fullfillmentStatus_1"],
"commitQuorum" : "majority"
}
)
• By default, index builds use "votingMembers" commit quorum, or all data-bearing voting replica set
members
• Do not use killOp to terminate an in-progress index builds in replica sets or sharded clusters
• Starting from 4.2 db.pets.dropIndex( "catIdx" ) to drop Index
• Run dropIndexes on the primary, it creates an associated "abortIndexBuild" oplog entry
Reach Us : Info@mydbops.com
Thank You
Reference
https://mydbops.wordpress.com/2020/05/02/securing-mongodb-cluster-with-tls-ssl/
https://mydbops.wordpress.com/2019/11/20/closer-view-of-mongodb-replica-sets/
https://www.alexbevi.com/blog/2020/11/20/mongodb-5-dot-0-startup2-progress-monitoring-improvements/
https://alibaba-cloud.medium.com/how-to-create-highly-available-mongodb-databases-with-replica-sets-
d4c4dde9bf2
https://stackoverflow.com/questions/59471864/why-an-odd-number-of-members-in-a-replica-set
https://github.com/mongodb/docs/blob/master/source/replication.txt
Ad

More Related Content

Similar to Evolution Of MongoDB Replicaset (20)

MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
Vivek Parihar
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
MapMyFitness
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
LetsConnect
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?
Aleksandr Tavgen
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Perforce Administration: Optimization, Scalability, Availability and Reliability
Perforce Administration: Optimization, Scalability, Availability and ReliabilityPerforce Administration: Optimization, Scalability, Availability and Reliability
Perforce Administration: Optimization, Scalability, Availability and Reliability
Perforce
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
Piotr Pelczar
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
MongoDB
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
Perforce
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
johnrjenson
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
Morgan Tocker
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
Steven Francia
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
Vivek Parihar
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
MapMyFitness
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
Chris Henry
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
LetsConnect
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?
Aleksandr Tavgen
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Perforce Administration: Optimization, Scalability, Availability and Reliability
Perforce Administration: Optimization, Scalability, Availability and ReliabilityPerforce Administration: Optimization, Scalability, Availability and Reliability
Perforce Administration: Optimization, Scalability, Availability and Reliability
Perforce
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
Achieve Internet
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
MongoDB
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
Perforce
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
johnrjenson
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
Morgan Tocker
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
Steven Francia
 

Recently uploaded (20)

Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Ad

Evolution Of MongoDB Replicaset

  • 1. Evolution of MongoDB Replica Set and Its Best Practices Manosh Malai CTO, Mydbops 28Th August 2021 Mydbops 8th Webinar
  • 2. Interested in Open Source technologies Interested in MongoDB, DevOps & DevOpSec Practices Tech Speaker/Blogger CTO, Mydbops IT Solution Manosh Malai About Me
  • 3. Consulting Services Managed Services Focuses on MySQL, MongoDB and PostgreSQL Mydbops Services
  • 4. 500 + Clients In 5 Yrs. of Operations Our Clients
  • 5. MongoDB Evolution and Its Exciting Features 4.0 to 5.0 MongoDB Replica Set Implementation Best Practices Introduction Agenda
  • 7. Scaling MongoDB MongoDB is designed to effectienly handle large dataset through vertical and horizontal scaling Additional node to share the load, MongoDB achieved primarily through Sharding Vertical scaling refers to the use of CPU, RAM, and I/O to increase the processing capability of a single server or cluster(Replica Set). ▪ Is MongoDB fit for large data ▪ Horizontal Scaling ▪ Vertical Scaling
  • 8. Vertical and Horizontal Scaling Vertical Scaling Horizontal Scaling
  • 9. MONGODB REPLICATION AND IT'S BEST PRACTICES
  • 11. Automatic Failover PRIMARY SECONDARY SECONDARY Heartbeat SECONDARY Heartbeat PRIMARY REPL electionTimeoutMillis 10 Sec(Default) + 2 sec to select New Primary = 12 Sec Median to elect New Primary
  • 12. Odd Number Of Replica Member 50 Members Max 7 Members Only can Vote Members count always Odd
  • 13. Cont... Number of Voting Member Majority Numberof Tolerable Failure 1 1 0 2 2 0 3 2 1 4 3 1 5 3 2 6 4 2 7 4 3 Even Node: (N/2)+1 = Majority Node Odd Node: (N+1)/2 = Majority Node 6/2+1 = 4 7+1/2 = 4 Number Of Members Alive Majority = Cannot Elect Primary And Write Fail
  • 14. READ And WRITE Replica Settings Read Preference • Primary • Primary Preferred • Secondary • Secondary Preferred • Nearest Write Preference • w: majority • w: <N> • j: true
  • 15. Secondary Member Type Type Read Accept Vote Become Primary Priority 0 Yes Yes No Hidden No Yes No Delay No Yes No Arbiter No Yes No
  • 16. Replica Set Best Practices db.collection.find().readPref('nearest', [ { 'dc': 'east' } ]) ▪ Use hostnames when configuring replica set members rather than IP-addresses ▪ Ensure that the replica set has an odd number of voting members ▪ Oplog Recovery Window need to maintain minimum 24 hours ▪ 3 type of connection URI ▪ Consistency Read: primary ▪ Eventually Consistent: SecondaryPreferred, maxStalnessSeconds ▪ write Concern w: 1 ▪ Nearest read preference , tag set and maxStalnessSeconds read setting need use in Geographically Distributed Members
  • 17. Replica Set Best Practices - 2 ▪ Use x.509 Certificate for Membership Authentication security: clusterAuthMode: x509 net: tls: mode: requireTLS certificateKeyFile: <path to its TLS/SSL certificate and key file> CAFile: <path to root CA PEM file to verify received certificate> clusterFile: <path to its certificate key file for membership authentication> bindIp: localhost,<hostname(s)|ip address(es)>
  • 18. Replica Set Best Practices - 3 • Enable Authorization • Create different role for Database Administration, Operation and Admin OPS User DBA User Super User List Database (show dbs) List Database (show dbs) ALL ACCESS(root) List collections (show collections) except admin,local,config database. List collections (show collections) except admin,local,config database. Read collection data (db.coll.find()) Read collection data (db.coll.find()) Able to check collection stats (db.coll.stats()) Able to check collection stats (db.coll.stats()) Able to check db stats (db.stats()) Able to check db stats (db.stats()) Able to create Index Able to create Index Able to see the current running queries (db.currentOp()) Able to see the current running queries (db.currentOp()) Able to kill the queries Able to kill the queries Able to see the replication status Able to see the replication status Able to see the list of users Able to see the list of users Able to see the inherited privileges of each role Able to see the inherited privileges of each role Able to rotate the log file Able to rotate the log file Able to drop Index Able to shutdown mongo Able to Lock writes Able to configure the replica set Able to change the replica set IP Able to run compaction against collection
  • 19. Replica Set Best Practices - 4 ▪ Mongod services should run in a non-privileged account with nologin/false shell. ▪ DO NOT Allow MongoDB to talk to the internet at all costs ▪ Configure security groups to block outbound connections to internet(Network Level) ▪ Configure IPTABLES/UFW to block/control outbound traffic(Instance Level) ▪ use the XFS filesystem ▪ Turn off atime for the storage volume with the database files ▪ <MongoDB Data Partition> xfs rw,noatime,attr2,inode64,noquota 0 0 ▪ Do not use huge pages virtual memory pages, MongoDB performs better with normal virtual memory pages. ▪ $ echo "never" > /sys/kernel/mm/transparent_hugepage/enabled ▪ $ echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
  • 20. Replica Set Best Practices - 5 ▪ Disable NUMA in your BIOS or invoke mongod with NUMA disabled. ▪ Edit /etc/systemd/system/multi-user.target.wants/mongod.service ▪ ExecStart=/usr/bin/numactl --interleave=all /usr/bin/mongod --config /etc/mongod.conf ▪ Ensure that readahead settings for the block devices that store the database files are relatively small as most access is non-sequential. For example, setting readahead to 32 (16KB) is a good starting point. ▪ ulimit to apply these settings: -f(filesize):unlimited -t(cputime):unlimited -v(virtualmemory):unlimited • -n(openfiles):64000 -m(memorysize):unlimited -u(processes/threads):32000
  • 21. Replica Set Best Practices - 6 net.core.somaxconn = 4096 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_keepalive_probes = 6 • Network Stack Tuning • Dirty Ratio vm.dirty_ratio = 15 vm.dirty_background_ratio = 5
  • 22. MONGODB EVOLUTION AND ITS RECENT EXCITING FEATURES
  • 23. Evolution of MongoDB Replica Set
  • 24. Resumable Initial Sync - From MongoDB 4.4 PRIMARY SECONDARY ▪ Initial Sync can attempt to resume the sync process if interrupted by a • network error • collection drop • collection rename • The Secondary tries to resume initial sync for 24 hours (Default) • db.adminCommand( { setParameter: 1, initialSyncTransientErrorRetryPeriodSeconds: <value> } )
  • 25. Resumable Initial Sync Monitoring - From MongoDB 4.4
  • 26. Streaming Replication - From MongoDB 4.4
  • 27. Streaming Replication - From MongoDB 4.4 Before 4.4: • Single OplogFetcher thread actively send getMore command to the Primary Oplog Collection • If there is data, a batch of upto 16MB is returned • Each batch acquisition needs to go through a complete network RTT • In the case of a bad replica set network, the performance of replica is severely limited by network latency From 4.4: • Incremental Oplog is constantly flowing into the secondary node, instead of relying on the active poll by the Secondary node • Compared with the previous method, at least half of the RTT is saved in the Oplog sync process. • The majority write performance increases by 50% on average db.adminCommand( { setParameter: 1, initialSyncTransientErrorRetryPeriodSeconds: <value> } ) True/False
  • 28. Minimum Oplog Retention Period - From MongoDB 4.4 • { replSetResizeOplog: <boolean>, size: <num MB> } • db.adminCommand({replSetResizeOplog:1, size: 16384}) • db.getSiblingDB("local").oplog.rs.stats(1024*1024).maxSize 3.6 • db.adminCommand({replSetResizeOplog: <int>, size: <double>, minRetentionHours: <double>}) • db.adminCommand({replSetResizeOplog: 1, size: 20000, minRetentionHours: 1.5}) • db.getSiblingDB("local").oplog.rs.stats(1024*1024).maxSize • db.getSiblingDB("admin").serverStatus().oplogTruncation.minRetentionHours 4.4
  • 29. Minimum Oplog Retention Period - From MongoDB 4.4 • In a longer retention time configured scenario, Because of a combination of high write volume, The oplog may grow beyond its maximum size to keep the Oplog entires. • From MongoDB 4.0 onward, MongoDB forbids you from dropping the local.oplog.rs collection • We can specify a size of 990 megabytes to 1 petabyte. • Reducing the oplog size does not automatically reclaim disk space. Compact must be performed on the local database's oplog.rs collection.
  • 30. Mirrored Reads - From MongoDB 4.4 • The Primary node will copy the read traffic to the one secondary node at a certain ratio • This helps to warm up the secondary node cache that is very similar to the Primary server cache • When primary server node went down, the mirrored Secondary node take responsibility and serve the traffic • This feature helps reduce the "Cache Miss" and disk load. And it keeps the same query performance all along as the previous primary. • The mirrored reads are "fire-and-forget" operations by the primary; i.e., the primary does not await the response for the mirrored reads. • Electable <members[n].priority> secondary replica set member receive mirrored read • A sampling rate of "0.0" disables mirrored reads. • A sampling rate of a number between "0.0" and "1.0" • sampling rate of "1.0" results in the primary forwarding all
  • 31. Mirrored Reads - From MongoDB 4.4 • db.adminCommand( { setParameter: 1, mirrorReads: { samplingRate: 0.10 } } ) • db.runCommand( { serverStatus: 1, mirroredReads: 1 } ) • Mirrored reads support the following operations: • Count • Distinct • Find • findAndModify (Specifically, the filter is sent as a mirrored read) • update (Specifically, the filter is sent as a mirrored read)
  • 32. Simultaneous Indexing - From MongoDB 4.4 • Before version 4.4, the index creation must be copied to the Secondary node to run once the primary node is complete • From 4.4, Indexes Build Simultaneously on Data-Bearing Replica Set Members • Index build process "startIndexBuild" oplog entry commitIndexBuild abortIndexBuild Primary check for Quorum Vote and any key constraint violations CreateIndex Command Each Member Vote commit for its finished index Secondary "startIndexBuild"
  • 33. Simultaneous Indexing - From MongoDB 4.4 Index Creation Command: db.getSiblingDB("examples").invoices.createIndexes( [ { "invoices" : 1 }, { "fulfillmentStatus" : 1 } ] ) Setting Index Commit Quorum: db.getSiblingDB("examples").runCommand( { "setIndexCommitQuorum" : "invoices", "indexNames" : ["invoices_1", "fullfillmentStatus_1"], "commitQuorum" : "majority" } ) • By default, index builds use "votingMembers" commit quorum, or all data-bearing voting replica set members • Do not use killOp to terminate an in-progress index builds in replica sets or sharded clusters • Starting from 4.2 db.pets.dropIndex( "catIdx" ) to drop Index • Run dropIndexes on the primary, it creates an associated "abortIndexBuild" oplog entry