Saturday, February 15, 2014

Simulation In Data Models - Introduce Uncertainty Into Your Life

When data models are built in Excel, assumptions are usually made. For example, take a look at this simple income statement data model (values are in thousands of dollars) and the associated calculations.

Revenue by region is fixed, the Cost of Goods are a percentage of Revenue and Overhead Expenses are fixed.

This model represents a forecast of a company’s Earnings for next year. It is static, so the result is always $170,000.

If this is a best guess at a forecast then the next level is to understand the possible variability in the model’s assumptions and results. How sensitive are Earnings to changes in Revenue?

Random Numbers

Random numbers can be used to add uncertainty to the data model. Even though Canadian Revenue is forecasted at $2,000,000, the worst case scenario is $1,500,000 and the best case scenario is $2,500,000. To reflect this in the data model, a random number in this range can be used instead of hard-coding $2,000,000. There are two common ways to implement this uncertainty in Excel:

=RANDBETWEEN(1500,2500)
=1500+RAND()*1000

The RANDBETWEEN Function in Excel will randomly select a number within the range specified, including the lower and upper limits provided. Press the F9 key to trigger a recalculation of the Workbook.

The RAND() Function generates a random number between 0 and 1 (includes 0 but not 1). This value can be multiplied by 1,000 and added to 1,500 to generate a random number between 1,500 and 2,500.

USA and International revenue can also incorporate uncertainty in their values. USA revenue will range from $600,000 to $1,000,000. International revenue will range from $100,000 to $300,000.

Simulation: An Unorthodox Use Of Data Tables

Every time the F9 key is pressed, a recalculation of the Workbook will occur but the new scenario is not being captured. In order to capture the results from each scenario, you can use Visual Basic for Applications (VBA) code, but an easier way to capture the results is to use Data Tables.

Data tables are conventionally used to substitute a value from the first column in the table into a model parameter input cell in order to change the model and capture its result. However, in simulations, data tables can be used simply to trigger a recalculation of the same model and capture the result.

To set up the data table in this example, there is a formula in cell F2 (=B18) set to the model output: Earnings. Starting on the next row, one column to the left, fill down the number of scenarios desired. In this example I would like 10,000 scenarios. Next, highlight the top row and the rest of the table down to the last scenario. From the Excel Data ribbon, look in the Data Tools group and select What-If Analysis then Data Table.

Typically, when the Data Table dialog appears, the Column input cell (column of 10,000 values) is set to a cell in the data model where the numbers 1 to 10,000 will be placed before a recalculation, but in this case, all we want is a recalculation, so select a blank cell such as $S$17. Each of the 10,000 values will be placed in cell $S$17, forcing a recalculation, then then result at the top of the Data Table (cell F2) will be captured in each row below. Click OK to run 10,000 scenarios of the data model and capture each Earnings value.

Understanding The Distribution Of Model Results

In order to understand the results, first copy and paste the Data Table as values so they will not change on the next recalculation. Next, sort the Earnings from lowest to highest. With the sorted range selected, create a Line Chart to show how the simulation results are distributed. The chart shows that there are about 900 of 10,000 scenarios where Earnings are negative, which corresponds to a 9% chance of losing money based on the current model assumptions. If the target Earnings are $300,000, we can see that about 17% of the 10,000 scenarios yielded a value of $300,000 or greater.

These concepts are even more powerful with larger data models. Also, simulation can also use bell-shaped or other distributions instead of a simple RANDBETWEEN Function. Simulation is about understanding how sensitive results are to changes in assumptions. This allows the modeller to assign probabilities to the likelihood of scenarios and take action early to change the outcome.

If you are interested in learning more about Excel Data Model Simulation, let me show your organization how it can introduce uncertainty into its Excel models. If you need a data model, database solution or analytics training, please contact:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro