How to Group Similar Items in Microsoft Excel [Fuzzy Grouping]

Do you need to group similar items in Excel?

Excel has a lot of ways to group items that are the same, but these rely on values being an exact match.

Data might be considered the same even though they are not an exact match. Grouping these similar items can be a difficult and often times manual task.

Excel does actually have a feature that can group these close but not quite the same values together! It’s just a bit hidden.

Power Query has a fuzzy grouping feature for just such instances. This will allow you to group items based on a similarity threshold as well as set up a translation table for different terms that mean the same thing.

This post will show you how to use Power Query to group all your related values. Get your copy of the example workbook used in this post to follow along!

Add Data to a Table

This example shows a list of questions. You can see that while all the items in the list are distinct, some of them are similar and need to be grouped together.

The first thing you will need to do is put the data you want to group into a table.

This will make it easy to import the data into power query and use the fuzzy grouping capabilities.

  1. Select any cell inside your data.
  2. Go to the Insert tab of the ribbon.
  3. Click on the Table command.

This will open the Create Table menu with your data range selected.

  1. Check the My table has headers option.
  2. Press the OK button.

The data is inside a table object and will be ready to import into power query.

Make sure you rename the table to something short and descriptive such as Questions.

Create a Translation Table

Before importing the data into power query and grouping similar item, it’s worth setting up a translation table.

This is a table you can use to build a custom mapping to help the fuzzy matching process.

This can be used to tell Excel certain text means the same thing. For example, in the context of your data, the words email and message might be the same thing and should be grouped together.

The translation table needs to have a From and To column headings. Values in the From column will then be considered the same as the value in the To column within a row.

This should also be in an Excel table.

Import Tables to Power Query

Now that you have your data to group and translation map in tables, these can be imported to power query.

Here’s how to import a table to power query.

  1. Select the table.
  2. Go to the Data tab.
  3. Click on the From Table/Range option.

Import both tables to power query and you will be able to group the data.

Group Data from the Transform Tab

The grouping of similar items will eventually use the Table.FuzzyGroup() function in the M code.

This isn’t availalbe from the query editor user interface.

You can set up a regular grouping from the UI which will automatically build you a Table.Group() formula.

This will have the same argument signature as the Table.FuzzyGroup() formula except for the last argument which deals with the fuzzy grouping options.

It will be easier to build the regular grouping formula from the UI and then edit it to the required fuzzy grouping formula.

  1. Select the column to group within your dataset. In this example, there is only one column.
  2. Go to the Transform tab.
  3. Click on the Group By option.

This will open the Group By menu with the Basic option and the column to be grouped already selected.

  1. Add a New column name to the input box.
  2. Select the All Rows option as the Operation. This will allow you to later expand the grouped data.
  3. Press the OK button.
= Table.Group(#"Changed Type", {"Item"}, {{"Data", each _, type table [Item=nullable text]}})

This will result in the above M code formula.

This will perform an exact match grouping. Depending on your dataset, this might not change anything other than adding a column where each row has a Table.

💡 Tip: You can find out more about the Table.Group function here.

Edit the Group Step to a Fuzzy Group

= Table.FuzzyGroup(#"Changed Type", {"Item"}, {{"Data", each _, type table [Item=nullable text]}},[IgnoreCase=true, IgnoreSpace=true, Threshold=0.8, TransformationTable=Translation])

Click into the formula bar and edit the resutling formula by adding the above parts seen in red.

In this case, Translation is the name of the query with the translation table containing your From and To columns.

This will change the function used to the Table.FuzzyGroup() function. The first three arguments are the same and a fourth argument is added at the end to specifcy the fuzzy groupinng options.

This last option argument is a record with these fields.

  • IgnoreCase: This can be either true or false. Setting this as true will allow for a case-insensitive grouping. For example, Email and email are grouped together.
  • IgnoreSpace: This can be either true or false. Setting this as true will ignore any space characters when grouping. For example, email and e mail are grouped together.
  • Threshold: This is a number between 0 and 1 that specifies the similarity score at which two values will be grouped. A threshold of 0 would group everything together while a threshold of 1 would require an exact match for grouping.
  • TransformationTable: This is a table that maps From values to an equivalent To value in the grouped results.

For more details on the Table.FuzzyGroup function you can view the Microsoft documentation.

Press the Enter key or click on the Check to the left of the formula bar to accept the updated formula.

This will perform the fuzzy grouping and group similar items while taking into account the mapping in your translation table!

Again, you will see a new column with a Table in each row. Each of these tables contains the grouped data and you can see a preview when you click on the empty part of the cell.

Now you can expand these tables out.

  1. Click on the Expand toggle in the Grouped column heading.
  2. Select the Expand option.
  3. Press the OK button.

You will see all the data now with values grouped based on the Item column.

💡 Tip: Review the fuzzy grouping results and then adjust the IgnoreCase, IgnoreSpace, and Threshold options to improve the grouping.

Load Data to Excel

This grouped data can now be loaded to Excel.

  1. Go to the Home tab of the power query editor.
  2. Click on the Close and Load button.
  3. Choose the Close and Load To option.
  1. Choose to load the data into a Table.
  2. Select the location to load the data.
  3. Press the OK button.

Now your similar items are grouped in a new Excel table!

Conclusions

Grouping items that are similar but not exactly the same is a difficult task that most users will approach manually.

Power Query does offer a fuzzy grouping function, but it’s not accessible from the query editor user interface.

Performing a regular grouping and then adjusting the resulting M code to a fuzzy grouping is a fairly easy way to implement the function due to their similar argument signature.

Did you know about the fuzzy grouping function available in power query for Excel? Let me know in the comments!

About the Author

John MacDougall

John MacDougall

John is a Microsoft MVP and qualified actuary with over 15 years of experience. He has worked in a variety of industries, including insurance, ad tech, and most recently Power Platform consulting. He is a keen problem solver and has a passion for using technology to make businesses more efficient.

Related Posts

Comments

4 Comments

  1. Paul Curran

    Didn’t know about fuzzy grouping. It’s a neat tool for muddy data that we often see.

    • John MacDougall

      It can be a great help, but always review the results!

  2. Samantha T

    I’m trying to follow along with this guide, but when I input the “[IgnoreCase=true, IgnoreSpace=true, Threshold=0.8, TransformationTable=Translation])” and FuzzyGroup line, I get an error that says Translation wasn’t recognized.
    Error: “Expression.Error: The name ‘Translation’ wasn’t recognized. Make sure it’s spelled correctly”

    Code: “= Table.FuzzyGroup(#”Changed Type”, {“Sold To: Account Name”}, {{“Grouped”, each _, type table [#”Sold To: Account Name”=nullable text]}},[IgnoreCase=true, IgnoreSpace=true, Threshold=0.8, TransformationTable=Translation])”

    Can you explain what I’m doing wrong or what I need to change?

    • John MacDougall

      Sounds like you don’t have a translation table or it’s named differently? Update the name.

Get the Latest Microsoft Excel Tips

Follow Us

Follow us to stay up to date with the latest in Microsoft Excel!

Subscribe for awesome Microsoft Excel videos 😃