MongoDB, the database for modern applications. If you are not familiar with MongoDB, its a general purpose, document based, distributed database.
What are we building ?
MongoDB Replica Sets are Groups of mongod processes that maintain the same dataset. Replica Sets provide high availability and redundancy. Replication provides redundancy and increases data availability. Copies of the data are replicated across nodes (mongod processes), replication provides a level of fault tolerance against the loss of a single database server.
Replication can provide increased read capacity, as applications can be configured to read from the slave nodes and write to the primary nodes. With data locality, you can spread your nodes across regions and let the applications read from the nodes closest to them.
In a replica set, a primary receives all the write operations and you can only have one primary capable of confirming writes. The primary records all the changes in its oplog.
For more information on MongoDB terminology, see their documentation
Install MongoDB
I have provisioned 3 Ubuntu 20.04 nodes which is in a private subnet, so I will be making use of private networking. MongoDB relies on reverse DNS lookups, so I will be adding the private addresses in my hosts file for each node, as all my communication between the servers will happen via private networking:
cat /etc/hosts
10.163.68.26 mongodb-1.pvt
10.163.68.12 mongodb-2.pvt
10.163.68.9 mongodb-3.pvt
This installation is for ubuntu 20.04, if you are using another distribution, have a look at their documentation.
Go ahead update the repositories, add the mongodb repostory, update and install mongodb:
sudo apt update
sudo apt install gnupg -y
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
sudo apt update
sudo apt install mongodb-org -y
Verify that mongodb is installed:
mongod --version
db version v5.0.9
Keyfile Authorization
All 3 nodes will use the same pre-shared key to authenticate the members that will contribute to the replica set. Let's create the directory where the file will be stored:
sudo mkdir /var/lib/mongodb-pki
Use openssl
or anything similar to generate a random key:
openssl rand -base64 741 > keyfile
Copy the file over to the other nodes with scp:
$ scp ./keyfile mongodb-2.pvt:/tmp/keyfile
$ scp ./keyfile mongodb-3.pvt:/tmp/keyfile
Move the keyfile in place on all 3 nodes and change the permissions of the file:
$ sudo mv keyfile /var/lib/mongodb-pki/keyfile
$ sudo chmod 600 /var/lib/mongodb-pki/keyfile
$ sudo chown -R mongodb:mongodb /var/lib/mongodb-pki
Configure Mongod
Time to configure our mongod instances, we will configure a configuration for each node. So for the first node:
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 127.0.0.1,10.163.68.26
processManagement:
timeZoneInfo: /usr/share/zoneinfo
#log slow operations
operationProfiling:
mode: "slowOp"
slowOpThresholdMs: 50
security:
authorization: enabled
keyFile: /var/lib/mongodb-pki/keyfile
replication:
replSetName: my-demo-replset
For the 2nd node:
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 127.0.0.1,10.163.68.12
processManagement:
timeZoneInfo: /usr/share/zoneinfo
operationProfiling:
mode: "slowOp"
slowOpThresholdMs: 50
security:
authorization: enabled
keyFile: /var/lib/mongodb-pki/keyfile
replication:
replSetName: my-demo-replset
And for the 3rd node:
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 127.0.0.1,10.163.68.9
processManagement:
timeZoneInfo: /usr/share/zoneinfo
operationProfiling:
mode: "slowOp"
slowOpThresholdMs: 50
security:
authorization: enabled
keyFile: /var/lib/mongodb-pki/keyfile
replication:
replSetName: my-demo-replset
Wired tiger Conf
WiredTiger is the storage engine used by MongoDB. To configure the WiredTiger cache in a mongod.conf file, you can add the following options:
# WiredTiger cache configuration
storage:
engine: "wiredTiger"
wiredTiger:
engineConfig:
cacheSizeGB: <cache_size_in_GB>
cacheSizeGB
: The size of the WiredTiger cache in gigabytes. This value should be set based on the amount of available memory on the server and the working set size of your data.
It is important to note that the cacheSizeGB setting is a target and not a hard limit. MongoDB will use as much memory as it needs but it will not use more than the specified limit.
You can also use the cachePressure
option to adjust the cache pressure.
# WiredTiger cache configuration
storage:
engine: "wiredTiger"
wiredTiger:
engineConfig:
cacheSizeGB: <cache_size_in_GB>
cachePressure: <value>
The cachePressure setting adjusts the balance between the WiredTiger cache and the file system cache. This can be used to make MongoDB more aggressive in evicting pages from the WiredTiger cache if memory is tight.
It is important to monitor your cache usage with the db.serverStatus().wiredTiger.cache
command and adjust the settings accordingly.
Enable and Start Mongod
Enable the service and start the mongod service:
$ sudo systemctl enable mongod
$ sudo systemctl restart mongod
Verify that the process has started:
sudo systemctl status mongod
mongod.service - MongoDB Database Server
Loaded: loaded (/lib/systemd/system/mongod.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-06-20 21:37:52 BST; 26s ago
Docs: https://docs.mongodb.org/manual
Main PID: 2950 (mongod)
CGroup: /system.slice/mongod.service
└─2950 /usr/bin/mongod --config /etc/mongod.conf
Jul 02 21:37:52 mongodb-1 systemd[1]: Started MongoDB Database Server.
Verify that the port is listening:
sudo netstat -tulpn | grep -E '(State|2950)'
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:27017 0.0.0.0:* LISTEN 2950/mongod
tcp 0 0 10.163.68.26:27017 0.0.0.0:* LISTEN 2950/mongod
Go ahead and do the exact same on the other 2 nodes.
Initialize the MongoDB Replica Set
When mongodb starts for the first time it allows an exception that you can logon without authentication to create the root account, but only on localhost. The exception is only valid until you create the user:
mongo --host 127.0.0.1 --port 27017
MongoDB shell version v5.0.9
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("14ac25ae-016a-4456-b451-60b5d7e866c7") }
MongoDB server version: 5.0.9
Welcome to the MongoDB shell.
>
Switch to the admin database and initialize the mongodb replicaset:
> use admin
switched to db admin
> rs.initiate()
{
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "10.163.68.26:27017",
"ok" : 1
}
Now that we have initialized our replicaset config, create the admin user and apply the root role:
my-demo-replset:PRIMARY> db.createUser({user: "mongo-admin", pwd: "mongo-pass", roles: [{role: "root", db: "admin"}]})
Successfully added user: {
"user" : "mongo-admin",
"roles" : [
{
"role" : "root",
"db" : "admin"
}
]
}
my-demo-replset:PRIMARY> exit
Now that we have exited the mongo shell, logon to mongodb with the created credentials also pointing at the replica set in the connection string:
mongo --host my-demo-replset/mongodb-1.pvt:27017 --username mongo-admin --password mongo-pass --authenticationDatabase admin
MongoDB shell version v5.0.9
connecting to: mongodb://mongodb-1.pvt:27017/?authSource=admin&gssapiServiceName=mongodb&replicaSet=my-demo-replset
2022-06-20T21:48:05.486+0100 I NETWORK [js] Successfully connected to 10.163.68.26:27017 (1 connections now open to 10.163.68.26:27017 with a 5 second timeout)
MongoDB server version: 4.0.10
my-demo-replset:PRIMARY>
Have a look at the replica set status using rs.status()
, we will see the members in our replica set. At this moment it will only be one node as we have only initiated the one node. This node will elected itself as a primary. We still need to add the other 2 nodes to the replica set.
Have a look at the status:
my-demo-replset:PRIMARY> rs.status()
{
"set" : "my-demo-replset",
"date" : ISODate("2022-06-20T20:49:19.596Z"),
"myState" : 1,
"term" : NumberLong(1),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1562100552, 1),
"t" : NumberLong(1)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1562100552, 1),
"t" : NumberLong(1)
},
"appliedOpTime" : {
"ts" : Timestamp(1562100552, 1),
"t" : NumberLong(1)
},
"durableOpTime" : {
"ts" : Timestamp(1562100552, 1),
"t" : NumberLong(1)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1562100512, 1),
"members" : [
{
"_id" : 0,
"name" : "10.163.68.26:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 687,
"optime" : {
"ts" : Timestamp(1562100552, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2022-06-20T20:49:12Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1562100271, 2),
"electionDate" : ISODate("2022-06-20T20:44:31Z"),
"configVersion" : 1,
"self" : true,
"lastHeartbeatMessage" : ""
}
],
"ok" : 1,
"operationTime" : Timestamp(1562100552, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1562100552, 1),
"signature" : {
"hash" : BinData(0,"ftyRJpO6BJ5U/oSfP+LzGUeGJvo="),
"keyId" : NumberLong("6709169581312704514")
}
}
}
We can also as the node if its the primary node, using rs.isMaster()
:
my-demo-replset:PRIMARY> rs.isMaster()
{
"hosts" : [
"10.163.68.26:27017"
],
"setName" : "my-demo-replset",
"setVersion" : 1,
"ismaster" : true,
"secondary" : false,
"primary" : "10.163.68.26:27017",
"me" : "10.163.68.26:27017",
"electionId" : ObjectId("7fffffff0000000000000001"),
"lastWrite" : {
"opTime" : {
"ts" : Timestamp(1562100572, 1),
"t" : NumberLong(1)
},
"lastWriteDate" : ISODate("2022-06-20T20:49:32Z"),
"majorityOpTime" : {
"ts" : Timestamp(1562100572, 1),
"t" : NumberLong(1)
},
"majorityWriteDate" : ISODate("2022-06-20T20:49:32Z")
},
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 100000,
"localTime" : ISODate("2022-06-20T20:49:41.764Z"),
"logicalSessionTimeoutMinutes" : 30,
"minWireVersion" : 0,
"maxWireVersion" : 7,
"readOnly" : false,
"ok" : 1,
"operationTime" : Timestamp(1562100572, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1562100572, 1),
"signature" : {
"hash" : BinData(0,"qn7JFMzIuaiK9sVhNkLAq7NqsMI="),
"keyId" : NumberLong("6709169581312704514")
}
}
}
Adding Nodes to the Replica Set
Let's add the second node, mongodb-2.pvt
to the replica set:
my-demo-replset:PRIMARY> rs.add("mongodb-2.pvt:27017")
{
"ok" : 1,
"operationTime" : Timestamp(1562100629, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1562100629, 1),
"signature" : {
"hash" : BinData(0,"wTNLWmROCfMKaBT4hjZGIyaiCTc="),
"keyId" : NumberLong("6709169581312704514")
}
}
}
And also adding the 3rd node, mongodb-3.pvt
the the replica set:
my-demo-replset:PRIMARY> rs.add("mongodb-3.pvt:27017")
{
"ok" : 1,
"operationTime" : Timestamp(1562100664, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1562100664, 1),
"signature" : {
"hash" : BinData(0,"WeAHspyluGFOOfyxnVS7pCKHPN8="),
"keyId" : NumberLong("6709169581312704514")
}
}
}
Now when we look at the replica set status, we should see that the 3 nodes will be part of the replica set:
my-demo-replset:PRIMARY> rs.status()
{
"set" : "my-demo-replset",
"date" : ISODate("2022-06-20T20:51:20.003Z"),
"myState" : 1,
"term" : NumberLong(1),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"appliedOpTime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"durableOpTime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1562100629, 1),
"members" : [
{
"_id" : 0,
"name" : "10.163.68.26:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 808,
"optime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2022-06-20T20:51:04Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1562100271, 2),
"electionDate" : ISODate("2022-06-20T20:44:31Z"),
"configVersion" : 3,
"self" : true,
"lastHeartbeatMessage" : ""
},
{
"_id" : 1,
"name" : "mongodb-2.pvt:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 50,
"optime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2022-06-20T20:51:04Z"),
"optimeDurableDate" : ISODate("2022-06-20T20:51:04Z"),
"lastHeartbeat" : ISODate("2022-06-20T20:51:18.288Z"),
"lastHeartbeatRecv" : ISODate("2022-06-20T20:51:19.843Z"),
"pingMs" : NumberLong(1),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : 3
},
{
"_id" : 2,
"name" : "mongodb-3.pvt:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 15,
"optime" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1562100664, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2022-06-20T20:51:04Z"),
"optimeDurableDate" : ISODate("2022-06-20T20:51:04Z"),
"lastHeartbeat" : ISODate("2022-06-20T20:51:18.292Z"),
"lastHeartbeatRecv" : ISODate("2022-06-20T20:51:19.662Z"),
"pingMs" : NumberLong(1),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : 3
}
],
"ok" : 1,
"operationTime" : Timestamp(1562100664, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1562100664, 1),
"signature" : {
"hash" : BinData(0,"WeAHspyluGFOOfyxnVS7pCKHPN8="),
"keyId" : NumberLong("6709169581312704514")
}
}
}
Connecting to the Replica set
Connecting to a replicaset will always route you to the primary:
$ mongo --host my-demo-replset/mongodb-2.pvt:27017 --username mongo-admin --password mongo-pass --authenticationDatabase admin
MongoDB shell version v5.0.9
connecting to: mongodb://mongodb-2.pvt:27017/?authSource=admin&gssapiServiceName=mongodb&replicaSet=my-demo-replset
2022-06-20T21:53:03.943+0100 I NETWORK [js] Starting new replica set monitor for my-demo-replset/mongodb-2.pvt:27017
2022-06-20T21:53:03.945+0100 I NETWORK [js] Successfully connected to mongodb-2.pvt:27017 (1 connections now open to mongodb-2.pvt:27017 with a 5 second timeout)
2022-06-20T21:53:03.946+0100 I NETWORK [js] Successfully connected to 10.163.68.26:27017 (1 connections now open to 10.163.68.26:27017 with a 5 second timeout)
2022-06-20T21:53:03.946+0100 I NETWORK [js] changing hosts to my-demo-replset/10.163.68.26:27017,mongodb-2.pvt:27017,mongodb-3.pvt:27017 from my-demo-replset/mongodb-2.pvt:27017
my-demo-replset:PRIMARY>
We can verify if we are on the primary by doing the following:
my-demo-replset:PRIMARY> rs.isMaster()['ismaster']
true
my-demo-replset:PRIMARY> rs.isMaster()['me']
10.163.68.26:27017
If you exit and connect without the replica set in the connection string, then it will direct you to that node:
$ mongo --host mongodb-2.pvt:27017 --username mongo-admin --password mongo-pass --authenticationDatabase admin
MongoDB shell version v5.0.9
connecting to: mongodb://mongodb-2.pvt:27017/?authSource=admin&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("3c343871-8814-4136-859d-350c78c2a2a7") }
MongoDB server version: 4.0.10
my-demo-replset:SECONDARY>
Reading from Slave Nodes
When we have a replica set we can write to our Primary (master) node and Read from our Secondary (slave) nodes. Before we can read from the secondary nodes, we need to tell mongodb that we want to read.
By default when you connect to a secondary node and want to read you will get this exception:
my-demo-replset:SECONDARY> show databases;
2022-06-20T21:59:13.169+0100 E QUERY [js] Error: listDatabases failed:{
"operationTime" : Timestamp(1562101152, 1),
"ok" : 0,
"errmsg" : "not master and slaveOk=false",
"code" : 13435,
"codeName" : "NotMasterNoSlaveOk”,
...
We first need to instruct mongodb that we want to read:
my-demo-replset:SECONDARY> rs.slaveOk()
Now that we have done that, we can read from the secondary:
my-demo-replset:SECONDARY> show databases;
admin 0.000GB
config 0.000GB
local 0.000GB
Upgrading a MongoDB Replica Set
When you want to upgrade a mongodb node in a replica set, you can accomplish that with a rolling upgrade.
You will first shutdown a secondary, do the upgrade, start the node, then upgrade the other secondary. When all the secondaries are upgraded, then upgrade the primary. This involves initiating a stepdown on the primary which will trigger an election, where one of the secondary nodes will be promoted as a primary, once the primary is selected on another node, the old primary can be upgraded.
The correct way of shutting down a mongodb node is with using db.shutdownServer()
:
my-demo-replset:PRIMARY> use admin
switched to db admin
my-demo-replset:PRIMARY> db.shutdownServer()
And to stepdown a node from primary to secondary:
my-demo-replset:PRIMARY> rs.stepDown()
my-demo-replset:SECONDARY>
Prioritizing Primary in Replicaset
you can use the rs.conf()
command to change the priority and vote of a member in a replica set:
# Connect to the primary node of the replica set
mongo --host <primary_hostname> --port <primary_port>
# Get the current configuration of the replica set
rs.conf()
# Update the priority and vote of a member
rs.conf({
_id: <replica_set_name>,
members: [
{
_id: 0,
host: <primary_hostname>:<primary_port>,
priority: 2,
votes: 1
},
{
_id: 1,
host: <secondary_hostname_1>:<secondary_port_1>,
priority: 1,
votes: 1
},
{
_id: 2,
host: <secondary_hostname_2>:<secondary_port_2>,
priority: 0,
votes: 0
}
]
})
priority
: is a value between 0 and 1000, with 1000 being the highest possible priority. By default, all members have a priority of 1.votes
: 0 or 1, with 1 meaning the member can participate in elections and 0 meaning it cannot. By default, all members have a vote of 1.