MongoDB: Utilization of an Index on Subdocuments

MongoDB Index on Subdocument

MongoDB Index on SubdocumentMongoDB has a lot of possibilities for creating indexes. We have seen in previous articles some of the available index types and discussed explain() usage:

Using Partial and Sparse Indexes in MongoDB

MongoDB Index Types and MongoDB explain() (part 1)

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

You can have a look at those if you need to review more details on the indexes and explain() investigation. In this article, we’ll see a particular case of an index created on an entire subdocument. We’ll see what you should be aware of when using it.

Create a Test Collection

Let’s create a collection with some random documents to run our tests. You can use the following javascript code to generate the sample test collection.

for (var i = 1; i <= 10000; i++) {
        db.test.insert(
                {
                        name: "name_"+i,
                        subdoc: {
                                a: i,
                                b: i*2,
                                c: i*i  
                        }
                }
        )
}

Let’s take a look at the documents we’ve created:

> db.test.find().pretty()
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a6fe"),
        "name" : "name_1",
        "subdoc" : {
                "a" : 1,
                "b" : 2,
                "c" : 1
        }
}
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a6ff"),
        "name" : "name_2",
        "subdoc" : {
                "a" : 2,
                "b" : 4,
                "c" : 4
        }
}
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a700"),
        "name" : "name_3",
        "subdoc" : {
                "a" : 3,
                "b" : 6,
                "c" : 9
        }
}
...
...

Now create an index on the subdocument subdoc

> db.test.createIndex( { subdoc: 1  } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

 

Querying the Subdocument

Now that we have the index on the subdocument, we can try to run some queries and see if the index is used or not.

Run a query to find a subdocument:

> db.test.find( { subdoc: { a:220, b:440, c: 48400 } } ).pretty()
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a7d9"),
        "name" : "name_220",
        "subdoc" : {
                "a" : 220,
                "b" : 440,
                "c" : 48400
        }
}

The query worked, let’s have a look at the execution plan using the explain():

> db.test.find( { subdoc: { a:220, b:440, c: 48400 } } ).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "corra.test",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "subdoc" : {
                                "$eq" : {
                                        "a" : 220,
                                        "b" : 440,
                                        "c" : 48400
                                }
                        }
                },
                "queryHash" : "07E4A30B",
                "planCacheKey" : "759877DE",
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "subdoc" : 1
                                },
                                "indexName" : "subdoc_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "subdoc" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "subdoc" : [
                                                "[{ a: 220.0, b: 440.0, c: 48400.0 }, { a: 220.0, b: 440.0, c: 48400.0 }]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "mdb1",
                "port" : 27017,
                "version" : "4.2.2-3",
                "gitVersion" : "2cdb6e50913583f627acc5de35dc4e04dbfe196f"
        },
        "ok" : 1
}

Cool, the index we’ve created on the subdocument has been used for solving the query. You can see that the FETCH stage used IXSCAN on index subdoc_1.

Let’s now try some different queries on the subdocument’s fields. We would like to execute a query to find the documents havign  b=440 and c=48400.

> db.test.find( { subdoc: { b:440, c:48400 }  }  ).pretty()
>

No results. Indeed this is not the right syntax for the query.  When querying only a few of the fields in the subdocument we need to use the dot-notation. Only exact match filters are supported for querying a subdocument.

Let’s try the query using the dot-notation:

> db.test.find( { "subdoc.b":440, "subdoc.c":48400 } ).pretty()
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a7d9"),
        "name" : "name_220",
        "subdoc" : {
                "a" : 220,
                "b" : 440,
                "c" : 48400
        }
}

Ok, now we’ve got the result. Let’s take a look at the execution plan:

> db.test.find( { "subdoc.b":440, "subdoc.c":48400 } ).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "corra.test",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "subdoc.b" : {
                                                "$eq" : 440
                                        }
                                },
                                {
                                        "subdoc.c" : {
                                                "$eq" : 48400
                                        }
                                }
                        ]
                },
                "queryHash" : "D50037C0",
                "planCacheKey" : "D50037C0",
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "$and" : [
                                        {
                                                "subdoc.b" : {
                                                        "$eq" : 440
                                                }
                                        },
                                        {
                                                "subdoc.c" : {
                                                        "$eq" : 48400
                                                }
                                        }
                                ]
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "mdb1",
                "port" : 27017,
                "version" : "4.2.2-3",
                "gitVersion" : "2cdb6e50913583f627acc5de35dc4e04dbfe196f"
        },
        "ok" : 1
}

The winning plan is a COLLSCAN. MongoDB is not able to use the index.

Maybe it’s because the filter condition was not the left-prefix of the subdocument. We queried for { b:400, c:48400}. Let’s try to query on the left-prefix involving a and b only and see what happens with the execution plan.

> db.test.find( { "subdoc.a":220, "subdoc.b":440 } ).pretty()
{
        "_id" : ObjectId("5f180f0fbdf0c5397723a7d9"),
        "name" : "name_220",
        "subdoc" : {
                "a" : 220,
                "b" : 440,
                "c" : 48400
        }
}
> db.test.find( { "subdoc.a":220, "subdoc.b":440 } ).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "corra.test",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "subdoc.a" : {
                                                "$eq" : 220
                                        }
                                },
                                {
                                        "subdoc.b" : {
                                                "$eq" : 440
                                        }
                                }
                        ]
                },
                "queryHash" : "10B20F88",
                "planCacheKey" : "10B20F88",
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "$and" : [
                                        {
                                                "subdoc.a" : {
                                                        "$eq" : 220
                                                }
                                        },
                                        {
                                                "subdoc.b" : {
                                                        "$eq" : 440
                                                }
                                        }
                                ]
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "mdb1",
                "port" : 27017,
                "version" : "4.2.2-3",
                "gitVersion" : "2cdb6e50913583f627acc5de35dc4e04dbfe196f"
        },
        "ok" : 1
}

Oh no, it’s still a COLLSCAN.

The only query that can use the index on a subdocument is the exact-match. We need to provide the filter on all the fields of the subdocument. For any other different condition, MongoDB won’t be able to rely on the index.

Also, the order of the fields in the filter must be exactly the same as in the document. Look at the following queries:

> db.test.find( { subdoc: { a:220, b:440, c:48400 } } )
{ "_id" : ObjectId("5f180f0fbdf0c5397723a7d9"), "name" : "name_220", "subdoc" : { "a" : 220, "b" : 440, "c" : 48400 } }
> db.test.find( { subdoc: { c: 48400, a:220, b:440 } } )
>

Only the query with the right order returned the result. The second query didn’t return anything because the order doesn’t match the original order. Be aware of that.

So, if you need to query internal fields on a subdocument and not the entire subdocument, you need to provide dot-notation filters only.

What About the Index?

We have seen that we need to run the query writing a different filter and that we cannot benefit from the index we’ve created. So, the index on the entire subdocument is not very useful. Indeed it’s not for the purpose of our queries.

Now, if we would like to optimize the query we’re running using dot-notation we need to create additional indexes on the internal files only. For example, if the majority of our queries will use a filter on a and b only, we should create the following index:

> db.test.createIndex( { "subdoc.a":1, "subdoc.b":1 } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}

Let’s now see if the query can use the new index.

> db.test.find( { "subdoc.a":220, "subdoc.b":440 } ).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "corra.test",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "subdoc.a" : {
                                                "$eq" : 220
                                        }
                                },
                                {
                                        "subdoc.b" : {
                                                "$eq" : 440
                                        }
                                }
                        ]
                },
                "queryHash" : "10B20F88",
                "planCacheKey" : "B9C95AB7",
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "subdoc.a" : 1,
                                        "subdoc.b" : 1
                                },
                                "indexName" : "subdoc.a_1_subdoc.b_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "subdoc.a" : [ ],
                                        "subdoc.b" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "subdoc.a" : [
                                                "[220.0, 220.0]"
                                        ],
                                        "subdoc.b" : [
                                                "[440.0, 440.0]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "mdb1",
                "port" : 27017,
                "version" : "4.2.2-3",
                "gitVersion" : "2cdb6e50913583f627acc5de35dc4e04dbfe196f"
        },
        "ok" : 1
}

Great, it’s an IXSCAN! The new index is used and the query will run faster.

Conclusion

An index on a subdocument can be used only in the case of an exact match filter. For any other kind of queries against only a few fields in the subdocument, we must provide additional simple or compound indexes.

So, the index on a subdocument is useless? No, it’s not. It depends on the queries you need to run. If you always have exact match filters then the index is useful. Also, you can create the index in combination with the unique clause if you need to assure the uniqueness of the subdocument. By the way, in this second case, you can achieve the same even with a compound unique index on all the fields.


by Corrado Pandiani via Percona Database Performance Blog

Comments