Backup and Restore of MongoDB Deployment on Kubernetes

backup MongoDB Kubernetes

backup MongoDB KubernetesEvery database environment should require a robust backup strategy as its a fundamental requirement to running a successful and robust database. No matter the size of the database, the function of the application, or how technologically advanced a company is, backups are now a requirement for everyone.

As a Solutions Engineers, we speak to database users from all types of companies, ranging from startups to the most complex database environments being used today. Interestingly enough, while talking about backups, we hear several concerning statements such as, “We never used backups in the past, so we don’t need them in this new environment”, “cloud services never fail” (hint – they do), and “this cluster is too big to fail.” It’s never an issue until something happens to your environment and you are unable to recover data and put your entire company at risk.

The adoption of the databases on Kubernetes (K8s) and other cloud-native platforms is definitely on the rise. There are multiple tools and approaches to deploying MongoDB on K8s. Assuming that you already have your MongoDB up and running on K8s, how do you implement a backup strategy? It could be a healthy amount of error-prone, manual work. You’d need to configure K8s Jobs from scratch, configure mongodump and all its parameters, setup a PersistentVolume, accompanying Claims. Additionally, you would need to take care about your PersistentVolume durability. What about streaming backup to some remote storage or, dealing with the backup of a complex, sharded cluster?

There are free tools that help you to streamline the entire MongoDB deployment process on K8s, including backups. The critical nature of backups doesn’t have to cost you any money to implement neither. Percona offers the following free enterprise-grade solutions for MongoDB:

Let’s explore how to take and restore backups using PSMDB Operator deployed on AWS Elastic Kubernetes Service (AWS EKS).


When deployed outside of K8s, PBM requires a running pbm-agent process on each node (next to the mongod instance) in the cluster/replica set. PBM uses it’s own ‘control collections’ in the admin database to store config and relay commands from the user (who uses the “pbm” CLI) to the pbm-agent processes. PBM’s backups are ‘logical’ style, the same as mongodump. This means the data is copied using a database driver connection rather than copying the underlying data files on disk.

When you deploy PSMDB cluster using PSMDB Operator, pbm-agent is automatically deployed in each pod as a sidecar container next to the mongod container. PSMDB Operator writes commands to the PBM control collections directly (in a way, replacing what outside of the K8s deployments pbm CLI does) controlling the entire backup process. PSMDB Operator supports two types of backups: on-demand and scheduled backups, both being controlled entirely by PSMDB Operator.

Backups taken by the PSMDB Operator can be stored in any S3 compatible storage, be it AWS S3, Google Cloud Storage, or locally deployed cloud-native MinIO storage. The backup contains a metadata file, a dump of all collections from your database, and an oplog dump covering the timespan of the backup.


We have a running PSMDB replica set deployed with PSMDB Operator in AWS EKS K8s cluster. Please note that we use PSMDB Operator v1.4.0 (the newest release at the moment of writing of this article). To deploy it, we followed these instructions and used all default settings. 

Check if your PSMDB cluster is running correctly:

$ kubectl get pods

NAME                                               READY   STATUS    RESTARTS   AGE
my-cluster-name-rs0-0                              2/2     Running   0          113s
my-cluster-name-rs0-1                              2/2     Running   2          72s
my-cluster-name-rs0-2                              2/2     Running   1          42s
percona-server-mongodb-operator-568f85969c-5hqbw   1/1     Running   0          22m

Access secrets: deploy/backup-s3.yaml

Let’s start by adding our AWS access and secret access keys. The operator will use these keys to access your S3 bucket (all cloud providers have different methods of distributing these keys). The keys that you put into K8s secrets must be base64 encoded. You can encode your keys by running

echo -n ‘YOUR_KEY’ | base64
  in bash CLI:
$ cat deploy/backup-s3.yaml

apiVersion: v1
kind: Secret
  name: my-cluster-name-backup-s3
type: Opaque
  AWS_ACCESS_KEY_ID: #############
  AWS_SECRET_ACCESS_KEY: ##############

Create a secret in K8s cluster with the following command:

$ kubectl apply -f deploy/backup-s3.yaml

secret/my-cluster-name-backup-s3 created

S3 bucket: deploy/cr.yaml

Next, we need to edit

section in the deploy/cr.yaml file so we can send our backups to an S3 bucket.
$ cat deploy/cr.yaml 

  s3-us-east: #(backup name)
    type: s3
      bucket: psmdb-operator-backup   #(bucket name)
      credentialsSecret: my-cluster-name-backup-s3   #(reference to credentials set in backup-s3.yaml)
      region: us-east-1

Apply changes:

$ kubectl apply -f deploy/cr.yaml configured

We are ready to take backups now!

On-Demand Backup

On-demand backup can be taken at any point in time. Pbm-control tool is distributed together with the operator code. Based on the requested details it will use pre-configured storage to store the on-demand backup.

Let’s start with editing deploy/backup/backup.yaml to ensure we are all set. psmdbCluster should match our cluster name and

the storage defined in previous steps.
$ cat deploy/backup/backup.yaml 

kind: PerconaServerMongoDBBackup
  name: backup1
  psmdbCluster: my-cluster-name
  storageName: s3-us-east

Run backup by the following command:

$ kubectl apply -f deploy/backup/backup.yaml created

If we set up everything correctly, our backup should be uploaded to the S3 bucket. To check its status run:

$ kubectl describe

If you look at your S3 AWS console, you should see backup files there:

Automated Backup

The second type of backup is a scheduled backup. The backup:tasks section of the deploy/cr.yaml file can be edited to schedule fully automatically executed backups. We can configure backups using the UNIX cron string format. Let’s say we want to take backups every day at midnight. Let’s edit the deploy/cr.yaml file:

$ cat deploy/cr.yaml 

  - name: daily-backup
    enabled: true
    schedule: "0 0 * * *"
    storageName: s3-us-west

Apply changes:

$ kubectl apply -f deploy/cr.yaml configured

Restoring Backups

To restore a backup, we need to find the backup name. We can obtain a list of all backups by using the following command:

$ kubectl get psmdb-backup

backup1   my-cluster-name   s3-us-east   2020-06-11T07:52:27Z   ready    28m         28m

Backup restoration configuration is in deploy/backup/restore.yaml. Let’s ensure there’s the appropriate backup name specified:

$ cat deploy/backup/restore.yaml 

kind: PerconaServerMongoDBRestore
  name: restore1
  clusterName: my-cluster-name
  backupName: backup1

To restore the backup, we execute the following command:

$ kubectl apply -f deploy/backup/restore.yaml created

To check the backup status use:

$ kubectl describe


Percona Kubernetes Operator for MongoDB utilizing Percona Backup for MongoDB is a Kubernetes-idiomatic way to run, backup, and restore a MongoDB replica set. If you’d like to learn more about these tools, check out our documentation.

by Michal Nosek via Percona Database Performance Blog