5 Things DBAs Should Know Before Deploying MongoDB
MongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up for a DBA. Oftentimes, relational DBAs will inherit MongoDB databases without knowing all there is to know about MongoDB. I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB. Don’t let the ease of installing MongoDB fool you; there are things you need to consider before deploying MongoDB. Here are five things DBAs should know before deploying MongoDB in production.
1) Enable Authentication and Authorization
Security is of utmost importance to your database. Gone are the days when security was disabled by default for MongoDB, but it’s still easy to start MongoDB without security. Without security and with your database bound to a public IP, anyone can connect to your database and steal your data. By simply adding some important MongoDB security configuration options to your configuration file, you can ensure that your data is protected. You can also configure MongoDB to utilize native LDAP for authentication. Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure. The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.
security: authorization: enabled keyfile: /path/to/your.keyfile
2) Deploy Replica Sets at a Minimum
A great feature of MongoDB is how easy it is to set up replication. Not only is it easy but it gives you built-in High Availability. When your primary crashes or goes down for maintenance, MongoDB automatically fails over to one of your secondaries and your application is back online without any intervention. Replica Sets allow you to offload reads, as long as your application can tolerate eventual consistency. Additionally, replica sets allow you to offload your backups to a secondary. Another important feature to know about Replica Sets is that only one member of the Replica Set can accept client writes at a time; if you want multiple nodes to accept writes, then you should look into sharded clusters. There are some tradeoffs with replica sets that you should also be aware of; when reading from a secondary, depending on your read preference, you may be reading stale data or data that hasn’t been acknowledged by all members of the replica set.
Backups are just as important with MongoDB as they were with any other database. There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, and full and incremental Backups. As mentioned, backups can be run from any node in your replica set. The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node. In addition to the above methods, you can also take snapshots of your data, and this is possible as long as you pause writes to the node that you’re snapshotting before freezing the file system to ensure a consistent snapshot of your MongoDB database.
4) Indexes Are Just as Important
Indexes are just as important for MongoDB as they are with relational databases. Just like with relational databases, indexes can help speed up your queries by reducing the size of data that your query returns thus speeding up query performance. Indexes also help your working set more easily fit into memory. Pay attention to the query patterns and look for full collection scans in your logfiles to know where there are queries that could benefit from an index. In addition to that, make sure that you follow the ESR rule when creating your indexes and examining your query patterns. Just like relational databases, indexes aren’t free, as MongoDB has to update indexes every time there’s an insert, delete or update, so make sure your indexes are being used and aren’t unnecessarily slowing down your writes.
5) Whenever Possible, Working Set < RAM
As with any database, fitting your data into RAM will allow for faster reads than from disk. MongoDB is no different. Knowing how much data MongoDB has to read in for your queries can help you determine how much RAM you should allocate to your database. For example, if your query requires a working set that is 50 GB and you only have 32 GB of RAM allocated to the Wired Tiger Cache, MongoDB is going to constantly read in more of the working set from disk and page it out to make room for the additional data, and this will lead to slower query performance and a system that is constantly using all of its available memory for its cache. Conversely, if you have a 50 GB working set and 100 GB of RAM for your Wired Tiger cache, the working set will fit completely into memory and as long as other data doesn’t page it out of the cache, MongoDB should serve reads much faster as it will all be in memory.
Avoiding reads from disk is not always possible or practical. To assess how many reads you’re doing from disk, you can use commands like db.serverStatus() or measure it with tools like Percona Monitoring and Management. Be sure your filesystem uses the MongoDB recommended XFS file system and whenever possible, uses Solid-State Drives (SSD’s) to speed up disk performance. Because of database workloads, you should also make sure to provision enough IOPS for your database servers to try and avoid disk bottlenecks as much as possible. Multiple cored systems will also tend to work better as this allows faster checkpointing for the Wired Tiger storage engine, but you will still be reliant on going as fast as the disks can take you.
While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a DBA, should consider before deploying MongoDB. We’ve covered enabling authentication and authorization to ensure you have a secure deployment. We’ve also covered deploying replica sets to ensure high availability, backups to ensure your data stays safe, the importance of indexes to your query performance and having sufficient memory to cover your queries, and what to do if you don’t. We hope this helps you have a better idea of how to deploy MongoDB and to be able to support it better. Thanks for reading!
by Mike Grayson via Percona Database Performance Blog