Deploy MongoDB Cluster as a Microservice on Kubernetes with Persistent storage
By Bikram Singh / Oct 05,2018
Overview
In this post we will learn to deploy a MongoDB replica set (cluster) as a microservice running on docker containers in Kubernetes. Since MongoDB is a database and we need its data to be persistent, if docker container is deleted and recreated, to achieve persistent storage we will use persistent volume feature in kuberntes to allocate volumes to containers using NFS.
Prerequisites
To complete this article we need a kubernetes cluster up and running with Service Discoverey enabled via DNS (i.e KubeDNS)
Things to consider when running MongoDB on container
- MongoDB database nodes are stateful. In the event that a container fails, and is rescheduled, it’s undesirable for the data to be lost (it could be recovered from other nodes in the replica set, but that takes time). To solve this, features such as the Persistent Volume abstraction in Kubernetes can be used to map what would otherwise be an ephemeral MongoDB data directory in the container to a persistent location where the data survives container failure and rescheduling.
- MongoDB database nodes within a replica set must communicate with each other – including after rescheduling. All of the nodes within a replica set must know the addresses of all of their peers, but when a container is rescheduled, it is likely to be restarted with a different IP address. For example, all containers within a Kubernetes Pod share a single IP address, which changes when the pod is rescheduled. With Kubernetes, this can be handled by associating a Kubernetes Service with each MongoDB node, which uses the Kubernetes DNS service to provide a hostname for the service that remains constant through rescheduling.
- Once each of the individual MongoDB nodes is running (each within its own container), the replica set must be initialized and each node added. This is likely to require some additional logic beyond that offered by off the shelf orchestration tools. Specifically, one MongoDB node within the intended replica set must be used to execute the rs.initiate and rs.add commands.
Ref:mongodb.com
We will create MongoDB replica set in a single Kubernetes cluster. We will create 3 replica sets 1 as primary and other 2 are secondary
Each member of the replica set will run as its own pod with a service exposed using NodePort. We will use service discovery in kubernetes so that all the replica sets can talk to each other and even if IP address changes due to container recreation they still can communicate to each other using service we created.
We will 3 MongoDB replica set members
mongo-node-1
mongo-node-2
mongo-node-3
We will also create 3 services and attach to each member
mongo-node-1 - service for mongo-node-1 (container)
mongo-node-2 - service for mongo-node-2 (container)
mongo-node-3 - service for mongo-node-3 (container)
For persitent storage I am using an external NFS server where I have create 3 directories named as respective replica set members. I will create 3 persistent volumes in kubernetes with 10GB each for replica set using below yaml .
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongo-node-1-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
# FIXME: use the right IP
server: 10.9.80.58
path: "/kubernetes/mongo-node-1"
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongo-node-2-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
# FIXME: use the right IP
server: 10.9.80.58
path: "/kubernetes/mongo-node-2"
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongo-node-3-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
# FIXME: use the right IP
server: 10.9.80.58
path: "/kubernetes/mongo-node-3"
ubuntu@kube-apiserver-1:~/mongodb$
Then we need to create persistent volume claim to bind these volumes to replica set members by calling these claims in our replica set pod definition
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-node-1-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 10Gi
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-node-2-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 10Gi
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-node-3-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 10Gi
ubuntu@kube-apiserver-1:~/mongodb$
Now we need to create our pod definition for our 3 replica sets as below. You can see below I am passing below command arguments(mongod –replSet rs0 bind_ip_all) when mongod process starts in the container which sets a replica set name and bind all the IPs on this container to mongod service. If you dont mention bind_ip argument then starting mongoDB 3.6 default bind_ip will be 127.0.0.1 and with this you wont be able to access the DB from outside. When these replica set members are created we need to connect to any one of the replica set member and initialize the replica set and add the other members
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-1.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo-node-1
labels:
name: mongo-node-1
spec:
type: NodePort
ports:
- port: 27017
targetPort: 27017
protocol: TCP
name: mongo-node-1
selector:
name: mongo-node-1
---
apiVersion: v1
kind: ReplicationController
metadata:
name: mongo-node-1-rc
labels:
name: mongo-node-1-rc
spec:
replicas: 1
selector:
name: mongo-node-1
template:
metadata:
labels:
name: mongo-node-1
spec:
containers:
- name: mongo-node-1
image: mongo
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip_all"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-node-1-db
mountPath: /data/db
volumes:
- name: mongo-node-1-db
persistentVolumeClaim:
claimName: mongo-node-1-pvc
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-2.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo-node-2
labels:
name: mongo-node-2
spec:
type: NodePort
ports:
- port: 27017
targetPort: 27017
protocol: TCP
name: mongo-node-2
selector:
name: mongo-node-2
---
apiVersion: v1
kind: ReplicationController
metadata:
name: mongo-node-2-rc
labels:
name: mongo-node-2-rc
spec:
replicas: 1
selector:
name: mongo-node-2
template:
metadata:
labels:
name: mongo-node-2
spec:
containers:
- name: mongo-node-2
image: mongo
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip_all"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-node-2-db
mountPath: /data/db
volumes:
- name: mongo-node-2-db
persistentVolumeClaim:
claimName: mongo-node-2-pvc
ubuntu@kube-apiserver-1:~/mongodb$ cat mongo-node-3.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo-node-3
labels:
name: mongo-node-3
spec:
type: NodePort
ports:
- port: 27017
targetPort: 27017
protocol: TCP
name: mongo-node-3
selector:
name: mongo-node-3
---
apiVersion: v1
kind: ReplicationController
metadata:
name: mongo-node-3-rc
labels:
name: mongo-node-3-rc
spec:
replicas: 1
selector:
name: mongo-node-3
template:
metadata:
labels:
name: mongo-node-3
spec:
containers:
- name: mongo-node-3
image: mongo
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip_all"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-node-3-db
mountPath: /data/db
volumes:
- name: mongo-node-3-db
persistentVolumeClaim:
claimName: mongo-node-3-pvc
ubuntu@kube-apiserver-1:~/mongodb$
Lets create the Persistent Volume , Persistent Volume Claim and replica set pods
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-1-pv.yaml
persistentvolume "mongo-node-1-pv" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-2-pv.yaml
persistentvolume "mongo-node-2-pv" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-3-pv.yaml
persistentvolume "mongo-node-3-pv" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-1-pvc.yaml
persistentvolumeclaim "mongo-node-1-pvc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-2-pvc.yaml
persistentvolumeclaim "mongo-node-2-pvc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-3-pvc.yaml
persistentvolumeclaim "mongo-node-3-pvc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-1.yaml
service "mongo-node-1" created
replicationcontroller "mongo-node-1-rc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-2.yaml
service "mongo-node-2" created
replicationcontroller "mongo-node-2-rc" created
ubuntu@kube-apiserver-1:~/mongodb$ sudo kubectl create -f mongo-node-3.yaml
service "mongo-node-3" created
replicationcontroller "mongo-node-3-rc" created
Verification
Kubernetes cluster
ubuntu@kube-apiserver-1:~$ kubectl get nodes
NAME STATUS AGE VERSION
kube-worker-1 Ready 139d v1.7.4
kube-worker-2 Ready 139d v1.7.4
kube-worker-3 Ready 139d v1.7.4
ubuntu@kube-apiserver-1:~$ kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
ubuntu@kube-apiserver-1:~$
Persistent Volume
ubuntu@kube-apiserver-1:~$ kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
mongo-node-1-pv 10Gi RWX Retain Bound default/mongo-node-1-pvc 1h
mongo-node-2-pv 10Gi RWX Retain Bound default/mongo-node-2-pvc 1h
mongo-node-3-pv 10Gi RWX Retain Bound default/mongo-node-3-pvc 1h
nfs 500Gi RWX Retain Bound default/nfs 3d
Persistent Volume Claim
ubuntu@kube-apiserver-1:~$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
mongo-node-1-pvc Bound mongo-node-1-pv 10Gi RWX 1h
mongo-node-2-pvc Bound mongo-node-2-pv 10Gi RWX 1h
mongo-node-3-pvc Bound mongo-node-3-pv 10Gi RWX 1h
nfs Bound nfs 500Gi RWX 3d
ubuntu@kube-apiserver-1:~$
Kubernetes Services
ubuntu@kube-apiserver-1:~$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.20.0.1 <none> 443/TCP 140d
mongo-node-1 10.20.134.1 <nodes> 27017:32521/TCP 2h
mongo-node-2 10.20.178.21 <nodes> 27017:31164/TCP 2h
mongo-node-3 10.20.134.155 <nodes> 27017:30326/TCP 2h
Kubernetes pods
ubuntu@kube-apiserver-1:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
busybox-2125412808-h9wd0 1/1 Running 1136 48d 172.200.180.26 kube-worker-1
mongo-node-1-rc-3d35h 1/1 Running 0 2h 172.200.127.15 kube-worker-2
mongo-node-2-rc-z5wtv 1/1 Running 0 2h 172.200.127.22 kube-worker-2
mongo-node-3-rc-c45wr 1/1 Running 0 2h 172.200.127.23 kube-worker-2
Our MongoDB replica set pods are running , Now lets initialize replic set. To do this I will connect to mongo-node-1 using mongo cli and issue below commands
rs.initiate()
conf=rs.conf()
conf.members[0].host="mongo-node-1:27017"
rs.reconfig(conf)
rs.add("mongo-node-2")
rs.add("mongo-node-3")
ubuntu@kube-apiserver-1:~/mongodb$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
busybox-2125412808-h9wd0 1/1 Running 1134 48d 172.200.180.26 kube-worker-1
mongo-node-1-rc-3d35h 1/1 Running 0 33s 172.200.127.15 kube-worker-2
mongo-node-2-rc-z5wtv 1/1 Running 0 33s 172.200.127.22 kube-worker-2
mongo-node-3-rc-c45wr 1/1 Running 0 31s 172.200.127.23 kube-worker-2
ubuntu@kube-apiserver-1:~/mongodb$
ubuntu@kube-apiserver-1:~/mongodb$ mongo --host 172.200.127.15
MongoDB shell version: 2.6.10
connecting to: 172.200.127.15:27017/test
Server has startup warnings:
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten]
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten]
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten]
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten]
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2018-02-05T23:50:45.758+0000 I CONTROL [initandlisten]
>
> rs.initiate()
{
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "mongo-node-1-rc-3d35h:27017",
"ok" : 1,
"operationTime" : Timestamp(1517874756, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1517874756, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs0:SECONDARY> conf=rs.conf()
{
"_id" : "rs0",
"version" : 1,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "mongo-node-1-rc-3d35h:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5a78ee430c0ce5fbd023ca9e")
}
}
rs0:PRIMARY> conf.members[0].host="mongo-node-1:27017"
mongo-node-1:27017
rs0:PRIMARY> rs.reconfig(conf)
{
"ok" : 1,
"operationTime" : Timestamp(1517874871, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1517874871, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs0:PRIMARY> rs.add("mongo-node-2")
{
"ok" : 1,
"operationTime" : Timestamp(1517874896, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1517874896, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs0:PRIMARY> rs.add("mongo-node-3")
{
"ok" : 1,
"operationTime" : Timestamp(1517874901, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1517874901, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs0:PRIMARY>
rs0:PRIMARY>
rs0:PRIMARY>
rs0:PRIMARY>
Verify Replica set status
rs0:PRIMARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2018-02-06T00:08:16.827Z"),
"myState" : 1,
"term" : NumberLong(1),
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"appliedOpTime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"durableOpTime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
}
},
"members" : [
{
"_id" : 0,
"name" : "mongo-node-1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1058,
"optime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-02-06T00:08:08Z"),
"electionTime" : Timestamp(1517874756, 2),
"electionDate" : ISODate("2018-02-05T23:52:36Z"),
"configVersion" : 4,
"self" : true
},
{
"_id" : 1,
"name" : "mongo-node-2:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 800,
"optime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-02-06T00:08:08Z"),
"optimeDurableDate" : ISODate("2018-02-06T00:08:08Z"),
"lastHeartbeat" : ISODate("2018-02-06T00:08:16.057Z"),
"lastHeartbeatRecv" : ISODate("2018-02-06T00:08:15.046Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "mongo-node-1:27017",
"configVersion" : 4
},
{
"_id" : 2,
"name" : "mongo-node-3:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 795,
"optime" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1517875688, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-02-06T00:08:08Z"),
"optimeDurableDate" : ISODate("2018-02-06T00:08:08Z"),
"lastHeartbeat" : ISODate("2018-02-06T00:08:16.057Z"),
"lastHeartbeatRecv" : ISODate("2018-02-06T00:08:15.339Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "mongo-node-2:27017",
"configVersion" : 4
}
],
"ok" : 1,
"operationTime" : Timestamp(1517875688, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1517875688, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
rs0:PRIMARY>