Fuzzy Matching in Power Query for Microsoft Excel

This post has been republished via RSS; it originally appeared at: Excel Blog articles.

Imagine this. We have been given a list of products from company A and a list of products from company B.

 

Our job is to match products between the two lists. But each company might have slightly different names for the same item.

 

This means we can't just use a lookup function because we don't have exact matches. We can't even use wildcards in a lookup function because the product names aren't different in a predictable way.

 

This is where we realize we have to manually match our products from the two lists and eventually we give up and start to cry! Right?

 

No, because we now have a fuzzy match option in Power Query. Lazy people rejoice!

 

Not sure what Power Query is? Then check out this Introduction to Power Query beforehand.

 

Example

01 List of Products.png

 

In our example, we have List A and List B. These are small and only have the one column, but imagine we're working with two large datasets with hundreds or thousands or products and many other columns.

 

We can see that none of the items in each list is an exact match, but they are likely the same product.

 

For example, in list A we have "Macaroni and cheese", but in list B we have "Macaroni & Cheese". They are the same thing but slightly different.

 

Create A Query For Both Tables

Both these lists have been converted into Excel Tables named ListA and ListB. The first thing we will need to do is create a query for each table.

  1. Select a cell inside the table.
  2. Go to the Data tab.
  3. Use the From Table/Range query command.
  4. Go to the Home tab in the Power Query editor.
  5. Select Close & Load ➜ Close & Load To.
  6. Choose Only Create Connection from the Import Data menu.

Once this is done for both tables, we'll be able to merge (join) the tables in another query.

 

Create A Merge Query Using Both Tables

Power Query has an option to merge tables.

 

This allows us to join data in two tables based on a common field like an ID column, or in our case the Product name column.

 

This is where we'll also find the new fuzzy match feature.

 

02 Merge Queries.png

 

Create a Merge query.

  1. Go to the Data tab.
  2. Choose Get Data from the ribbon.
  3. Choose Combine Queries from the menu.
  4. Choose Merge.

This will open up the Merge query menu.

 

Merge Menu Settings

 

03 Merge Menu.png

 

The Merge menu is where we can enable fuzzy matching.

  1. Select the first query we want to merge. For us, this will be ListA. We also need to select the column to base the merge on, for us this is the Product name column.
  2. Select the second query we want to merge and the column to base the merge on.
  3. We also need to select what type of join to perform. There are several choice, but we'll stick with the default option to return all items from the first list and only matching items from the second list. This is also where we can select fuzzy matching as an option, check the Use fuzzy matching to perform the merge box.

Notice at the bottom of the merge menu, Power Query tells us how many matches are found. When we enable fuzzy matches, this goes from 0 of 8 to 2 of 8.

 

Fuzzy Matching Options

We've already improved our matching by just enabling the fuzzy matching option. But there are more settings we can use to help improve the matching.

 

04 Fuzzy Matching Options.png

 

Click on the collapsed Fuzzy matching options and more advanced settings will be revealed. These can help us match more items in our lists.

 

Similarity Threshold

There's a similarity threshold we can adjust to help match more items. This is a value between 0 and 1 that indicates how similar values need to be in order to match.

  • 0 means everything will match.
  • 1 means only exact matches will match.

The default value for this will be 0.8.

 

Decreasing the threshold to 0.5 will take us from 2 of 8 matches to 5 of 8 matches.

 

Decreasing the threshold further to 0.3 will get us all 8 items matching.

 

But remember, the lower the threshold, the higher the likelihood of getting unwanted matches.

 

Ignore Case

Power Query is case sensitive. This means X and x are not considered the same value. This will be true when merging tables as well.

 

Luckily there is an option to ignore case when performing a fuzzy match merge. This is enabled by default.

 

Match by Combining Text Parts

There's an option to combine text parts to find matches. This means Power Query will treat things like "birthday" the same as "birth day". This is enabled by default.

 

Maximum Number of Matches

With fuzzy matching there is the potential to match items together that shouldn't be a match.

 

This option will help prevent unwanted matches by limiting the number of matches that are returned.

 

If we set this to 1, then Power Query will only return the best match and won't return the other matches that are still above the similarity threshold.

 

By default this is left blank and will allow up to 2,147,483,647 matches. 2 billion matches should be enough for anyone.

 

Transformation Table

Fuzzy matching has a very handy feature that allows us to set a transformation table.

 

This allows us to define pairs of values we consider the same and Power Query will treat them as equivalent during the fuzzy merge.

 

A transformation table can be created by creating a table with two column (a From and a To column).

 

05 Transformation Table.png

 

The above table will tell Power Query that Mobile and Cell should be considered the same thing when matching.

 

Expand Merged Data In The Power Query Editor

 

06 Expand Merged Table Column.png

 

When we press the OK button in the merge menu, this will open up the Power Query editor.

 

We will see our ListA Product data and a ListB column which contains a bunch of tables. These tables contain our merged data from ListB.

 

We need to expand out this ListB column to get the data. Click on the expand toggle in the right of the column heading and press the OK button.

 

07 Fuzzy Match Results.png

 

We can now Close & Load our results into a table in our workbook to get our matched results.

 

Conclusions

Matching data is a very common task.

 

Up until now, we only had ways to find exact matches.

 

Partial matching either required a lot of manual work or using wildcard characters if our data had predictable differences.

 

Now we have a quick and easy way to match data even when there isn't a common pattern to the differences.

 

What a time saver! I only wish I had this years ago.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.