New string function performance improvements and case-insensitive search

This post has been republished via RSS; it originally appeared at: Microsoft Developer Blogs.

Querying in Azure Cosmos DB just got even better! You now have an option for case-insensitive queries with the following string search system functions: Contains EndsWith StartsWith StringEquals Additionally, both Contains and EndsWith also have significant performance improvements. This update was rolled out to Azure Cosmos DB core (SQL) API accounts in our most recent service update. Customers using Azure Cosmos DB’s API for MongoDB accounts can leverage the case-insensitive support and performance improvements through the $regex query evaluation operator. Testing performance Let’s test the performance improvements by running some queries on a sample dataset with 8.5 million documents. I’ve uploaded the dataset to a Cosmos container with 30,000 provisioned RU’s. The dataset was generated using Bogus and contains a unique id identifier, a name, an address, a company, and a job: Contains Let’s first do a search for all the people that live on a street named “Brooks Street”. We’ll run the following query to check if the address property contains the word “Brooks Street”: The query returns 17 results with the following RU charges: Original RU charge: 221,566.68 RUs New RU charge: 224.99 RUs The performance improvement for Contains, gave this query over a 99% decrease in RU charge! Aside from indexing the property in the Contains system function, there are no other changes you need to make to see these improvements.   EndsWith Let’s look at an example with EndsWith. Here’s a query that finds all the people that have a name that ends in “Lee”: This query returns 0 results with the following RU charges: Original RU charge: 198,649.32 RUs New RU charge: 122.76 RUs   StartsWith Additionally, you can take advantage of a new parameter in each of these system functions to get case-insensitive support. This parameter is optional and defaults to false when unspecified. The RU charge for Contains and EndsWith is the same regardless of whether they are case-insensitive. Here’s an example with StartsWith to find the TOP 100 job titles that start with “developer” whether the case matches or not: RU charge: 38.91 RUs   StringEquals If you wanted to check for a full match, you could use StringEquals: RU charge: 38.89 RUs The RU charge for StartsWith and StringEquals is slightly higher with the case-insensitive option than without it. In general, Contains and EndsWith will consume more RUs than StartsWith or StringEquals. The RU charge of Contains and EndsWith will increase as the cardinality of the property in the system function increases. Learn more about Contains and EndsWith index utilization. Next steps We hope you try out these new query features! Here’s a query lab with some sample data to get started. Beyond adding an index for properties in the system functions, there is nothing else you need to do to leverage these significant improvements! If you are interested in additional text search functionality, you can leverage Azure Cosmos DB's integration with Azure Cognitive Search.   If you have existing containers with these system functions, check out your Request Unit (RU) consumption. With these recent optimizations, your total RU consumption may have decreased and you may be able to decrease the amount of provisioned throughput on these containers.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.