Not all data is numeric, so when creating random sets of sample data it can be very useful to know how to randomly select some text data from a list.

In that post, I showed you a way to randomly select from a list using the RANDBETWEEN function. This method produced an equal chance of selecting each item in the list. But what if we want to weight those probabilities of selection so that some items are more frequently selected than others?

With this method we can create the probability distribution of an item being selected.

In this example, we are going to select from a list with 4 items. This is what’s in our **Distribution** and **List** columns in the above table.

**Red**– We want this to be selected**10%**of the time.**Orange**– We want this to be selected**15%**of the time.**Blue**– We want this to be selected**45%**of the time.**Green**– We want this to be selected**30%**of the time.

Note that our probabilities of selection need to **add to 100%**! Our **Lookup** column is then just the running total of the **Distribution** Column. This will be the key to our random selection.

## Generic Formula for Random Selection

We will use a VLOOKUP with an **approximate match** to lookup an item from our list.

```
=VLOOKUP(RAND(),DistTable,3,TRUE)
```

**DistTable**– This is our table shown above which contains our distribution and list of items to select from.**3**– We want to return the result from the third column of the DistTable which contains the list items.**TRUE**– This tells Excel to use an approximate match for the VLOOKUP.

## How Does This Work?

RAND() is an Excel function with no input parameters that will return a uniformly distributed random number between 0 and 1. We use this random number between 0 and 1 as the value to lookup in our **DistTable**. Since this lookup value is random, the result returned from the **List** column will also be random.

For example, if RAND() returned **0.1156** then the approximate match would be the **10%** in the **Lookup** column because **10% <= 0.1156 < 25%**. VLOOKUP would then return **Orange** since this is the **3rd** column in the **10%** row.

## Let’s Check if it Works!

Let’s check if it works! If we use this formula 100 times, we should see approximately 10 Red, 15 Orange, 45 Blue and 30 Green results.

We can easily create a pivot table to count the number of items of each from the list, and we can even create a nice looking PivotChart to show the results. Our results look pretty close to the distribution we created. According to the law of large numbers, the larger the number of selections we make then the closer we will get to our intended distribution.

## 0 Comments