Speaking with the accountants, they let me know that the highest value transactions should be investigated. Also, I should note that when selecting a random sample of transactions from a list, the subset of transactions should be a representative sample of the entire list. Statistics can be used to figure out how many transactions are enough to gather a representative sample, but this article will focus on how to generate the list of unique random numbers once you know how many you need.
The Easy Method
The good thing about the easy method is that it can be done quickly and easily, it just doesn't directly create a list of unique random transaction numbers. List the transactions in an Excel Worksheet, then create a new column to the right of the data set. In the first cell at the top of the column, type the column label as "Random Number". In the cell below type the following formula: =RAND() then hit enter. A random number between 0 and 1 is generated and every time you hit the F9 key or change a cell on the Worksheet, a new random number between 0 and 1 will appear. If you copy and paste this formula down to the end of the transaction list, you can assign a random number to each transaction. At this point, you should copy this new column then paste as values so that the numbers won't change. Now sort the list of transactions based on these random values. You can take the first n rows to create a random subset of transactions.
The Hard Method
Now, to answer the question directly, some formula magic is required. In fact, an array formula is needed to create a unique list of random transaction numbers. To start with, we need a lower and upper boundary of transaction numbers. Type “Lower Bound” in cell C1 and 70 in cell D1. Type “Upper Bound” in cell C2 and 79 in cell D2.
Place a label in cell A1 that says "Unique Random Numbers". Now, in cell A2, type the following formula: =RANDBETWEEN(D1,D2) then press enter. This will generate a random number without decimal places from 70 to 79. Pressing the F9 key or changing a cell on the worksheet will generate another random number within the transaction number boundaries.
Now, an array formula needs to be written in cell A3 that respects the transaction number range, is aware of what is in the cell above, and chooses a random number from the available remaining numbers. The magic formula is:
=LARGE(ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))),RANDBETWEEN(1,($D$2-$D$1)-ROW(A3)+3))
When entering the formula, make sure to hold down Ctrl+Shift while pressing the Enter key. This is a Ctrl+Shift+Enter array formula.
To understand the formula better, it should be explained step-by-step, so start with:
ROW(INDIRECT($D$1&":"&$D$2)).
When the numbers from D1 and D2 are placed in this formula, the result is each number from 70 to 79 in an array: {70,71,72,73,74,75,76,77,78,79}. We want to select a random number from this array, but not the number that appears above in cell A2, so this array needs to be multiplied by an array of 1’s and 0’s.
COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2))) is the next part of the formula and is tricky to understand. The expandable range to search is A$2:A2, which contains one random number between 70 and 79. The second part of the COUNTIF performs a search for all 10 numbers between 70 and 79. Only one of these 10 numbers can be found in cell A2. If A2 contains the number 79, then this COUNTIF formula will return an array of counts that look like {0,0,0,0,0,0,0,0,0,1}. This array shows us what has been eliminated by the previously generated random number, but we want to know which numbers are still available for selection, so put this COUNTIF inside a NOT function to reverse the logic and switch the 0’s and 1’s to {1,1,1,1,1,1,1,1,1,0}.
=ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))) multiplies the two arrays together. In this example, it would look like {70,71,72,73,74,75,76,77,78,0} because 79 is already selected in cell A2. The next step is to select the first or second or third or maybe the ninth largest number in this array. Place this formula inside a LARGE function to choose, let’s say, the fourth largest number from the remaining numbers: =LARGE(ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))),4)
To get the LARGE formula working, make sure to hold down Ctrl+Shift while pressing the Enter key.
Of course we don’t just want the fourth largest number, we want to randomly choose from the 9 remaining numbers. Replace the 4 with RANDBETWEEN(1,9). Make sure to press Ctrl+Shift+Enter. This formula works well for cell A3, but when it gets copied down to cell A4, there will only be 8 numbers to choose from, so the RANDBETWEEN(1,9) needs to change to RANDBETWEEN(1,($D$2-$D$1)-ROW(A3)+3). The 9 has been replaced with a formula that calculates the difference between the upper and lower bound then subtracts the current row and adds back 3. Ctrl+Shift+Enter this formula.
If you want to follow the action inside the formula using Excel, select cell A3 then click on the Formulas tab and the Evaluate Formula button. Next, press the Evaluate button repeatedly to show the step-by-step calculation.
With cell A3 selected, copy the cell then select A4:A7 and paste the formula. The $ symbols were carefully placed in the formula to make sure that the formula works when copied down. You should always have six unique numbers between 70 and 79. You can change which six numbers appear by pressing the F9 key.
Once you are confident that there will never be a duplicate number, change the values in cells D1 and D2 to a range of transaction numbers. Also, copy and paste the formula in cell A3 down for as many unique random numbers as you want.
If you have any questions for future articles, would like to know more about using Excel, Access Databases, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:
Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/
Scott Cuthbertson BBA
Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.
© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro