MongoDB Backup Best Practices

MongoDB Backup Best Practices

MongoDB Backup Best PracticesIn this blog, we will be discussing different backup strategies for MongoDB and their use cases, along with the pros and cons of each.

Why Take Backups?

Regular database backups are a crucial step in guarding against unintended data loss events. It doesn’t matter if you lose your data because of mechanical failure, a natural disaster, or criminal malice, your data is gone. However, the data doesn’t need to be lost. You can back it up.

Generally, there are two types of backups used with databases technologies like MongoDB:

  • Logical Backups
  • Physical Backups

Additionally, we have the option of incremental backups as well (part of logical), where we can capture the deltas or incremental data changes made between full backups to minimize the data loss in case of any disaster. We will be discussing these two backup options, how to proceed with them, and which one suits better depending upon requirements and environment setup.

Logical Backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON formatted file.

During logical backups using client API, the data gets read from the server and returned back to the same API which will be serialized and written into respective “.bson”, “.json”, or “.csv”  backup files on disk depending upon the type of backup utilities used.

MongoDB offers the below utility to take logical backups:

Mongodump: Takes dump/backup of the databases into “.bson” format which can be later restored by replaying the same logical statements captured in dump files back to the databases.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax, then the backup will be taken for the entire database or collections respectively. If “authorization” is enabled then we must specify the “authenticationDatabase”.

Also, you should use “–oplog” to take the incremental data while the backup still running, and we can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire database backups.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

Pros:

  1. It can take the backup at a more granular level like a specific database or a collection which will be helpful during restoration.
  2. Does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons:

  1. As it reads all data it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache. The WT cache pressure increases which slows down the performance.
  2. It doesn’t capture the index data into the metadata backup file due to which while restoring, all the indexes have to be built again after the collection data is reinserted. This will be done in one pass through the collection after the inserts have finished, so it can add a lot of time for big collection restores..
  3. The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.

Note: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation from Primary node.

As we have different types of environment setups, we should be approaching each one of them as below.

  1. Replica set: Always preferred to run on secondaries.
  2. Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them.

Since we are discussing distributed database system like shard cluster, we should also keep in mind to have consistency in our backups at a point in time (Replica sets backups using mongodump are generally consistent using “–oplog”).

Let’s discuss this scenario where the application is still writing data and cannot be stopped because of business reasons. Now, even if we take backups of the config server and each shard separately, at some point in time, the backup will finish at different times because of data volume, load, etc. Hence, while restoring there might be some inconsistencies occurring because of the same reason.

For that, Percona Backup for MongoDB is very useful (uses mongodump libraries internally) since it tails the oplog on each shard separately while the backup is still running until completion. More references can be found here in the release notes.

Now comes the restoration part when dealing with Logical backups. Same as for backups, MongoDB provides the below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place once the data is restored which causes to use additional memory resources and time.

mongorestore --host=mongodb1.example.net --port=27017 --username=user  --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

For the restore of the incremental dump, we can add –oplogReplay in the above syntax to replay the oplog entries as well.

Note: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Physical/Filesystem Backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath)  at a point in time, and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several methods to take the filesystem level backup, also known as Physical backups, as below.

  1. Manually Copying the entire data files (using Rsync → Depends on N/W bandwidth)
  2. LVM based snapshots
  3. Cloud-based disk snapshots (AWS/GCP/Azure or any other cloud provider)
  4. Percona hot backup here

We’ll be discussing all these above options but first, let’s see their Pros and Cons over Logical Based backups.

Pros:

  1. They are at least as fast as, and usually faster than, logical backups.
  2. Can be easily copied over or shared with remote servers or attached NAS.
  3. Recommended for large datasets because of speed and reliability.
  4. Can be convenient while building new nodes within the same cluster or new cluster.

Cons:

  1. It is impossible when performing a restore on a less granular level such as specific DB or Collection restore.
  2. Incremental backups cannot be achieved yet.
  3. A dedicated node is recommended for backup (might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB

Index Size: <1MB (since it was only on _id for testing)

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 137029,
        "avgObjSize" : 2097192,
        "dataSize" : 267.6398703530431,
        "storageSize" : 13.073314666748047,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 0.0011749267578125,
        "scaleFactor" : 1073741824,
        "fsUsedSize" : 16.939781188964844,
        "fsTotalSize" : 49.98826217651367,
        "ok" : 1,
        ...
}
demo:PRIMARY>

        1. Hot backup

Syntax : 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

 

Note: The backup path “backupDir” should be absolute. It also supports storing the backups on the filesystem and AWS S3 buckets.

[root@ip-172-31-37-92 tmp]# time mongo  < hot.js
Percona Server for MongoDB shell version v4.2.8-8
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }
Percona Server for MongoDB server version: v4.2.8-8
switched to db admin
{
        "ok" : 1,
        ...
}
bye

real    3m51.773s
user    0m0.067s
sys     0m0.026s
[root@ip-172-31-37-92 tmp]# ls
hot  hot.js  mongodb-27017.sock  nohup.out  systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD  tmux-0
[root@ip-172-31-37-92 tmp]# du -sch hot
15G     hot
15G     total

Notice the time taken by “Percona hot backup” was just 4 minutes approx. It is even very helpful during the rebuild of a node or spinning new instances/cluster with the same dataset. The best part is it doesn’t compromise with locking of writes or any performance hits. However, it is also recommended to run it against the secondaries. 

       2.  Filesystem Snapshot

The approximate time taken for the snapshot to be completed was only 4 minutes.

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots  --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"
{
    "SnapshotId": "snap-0f4403bc0fa0f2e9c",
    "StartTime": "2020-08-26T12:26:32.783Z"
}
[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots \
> --snapshot-ids snap-0f4403bc0fa0f2e9c
{
    "Snapshots": [
        {
            "Description": "This is my snapshot backup",
            "Encrypted": false,
            "OwnerId": "021086068589",
            "Progress": "100%",
            "SnapshotId": "snap-0f4403bc0fa0f2e9c",
            "StartTime": "2020-08-26T12:26:32.783Z",
            "State": "completed",
            "VolumeId": "vol-0def857c44080a556",
            "VolumeSize": 50
        }
    ]
}

       3. Mongodump

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &
[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out
2020-08-26T12:36:20.842+0000    writing test.collG to /mongodump/test/collG.bson
2020-08-26T12:51:08.832+0000    [####....................]  test.collG  27353/137029  (20.0%)

Note: Just to give an idea, we can clearly see that for the same dataset where snapshot and hot backup took only 3-5 minutes, “mongodump” took almost 15 minutes just for 20% of the dump. Hence the speed to back up the data is definitely very slow as compared to the other two options we have. And on top of that, we would only be left with one option to restore the backup that is “mongorestore” which will eventually make the whole process much slower.

Conclusion

So, which backup method would be the best? It completely depends on factors like the type of infrastructure, environment, dataset size, load, etc. But generally, if the dataset is around 100GB or less than that, then the logical backups are the best option along with scheduled incremental backups as well, depending upon RTO (Recovery Time Objective)/RPO (Recovery Point Objective)  needs. However, if the dataset size is more than that, we should always go for physical backups including incremental backups (oplogs) as well.

Interested in trying Percona Backup for MongoDB? Download it for free! 


by Divyanshu Soni via Percona Database Performance Blog

Comments