Monday, December 01, 2014

Selecting Random Transactions - Generating Random Numbers for Auditing

I taught a class recently and one of the students wanted to know how to randomly sample transactions from a list for auditing. He phrased the question this way: "How can I generate a list of random transaction numbers and make sure they don't repeat". Immediately, I thought of a really easy way to answer the question. But to give the literal answer to that question, it takes more effort.

Speaking with the accountants, they let me know that the highest value transactions should be investigated. Also, I should note that when selecting a random sample of transactions from a list, the subset of transactions should be a representative sample of the entire list. Statistics can be used to figure out how many transactions are enough to gather a representative sample, but this article will focus on how to generate the list of unique random numbers once you know how many you need.

The Easy Method

The good thing about the easy method is that it can be done quickly and easily, it just doesn't directly create a list of unique random transaction numbers. List the transactions in an Excel Worksheet, then create a new column to the right of the data set. In the first cell at the top of the column, type the column label as "Random Number". In the cell below type the following formula: =RAND() then hit enter. A random number between 0 and 1 is generated and every time you hit the F9 key or change a cell on the Worksheet, a new random number between 0 and 1 will appear. If you copy and paste this formula down to the end of the transaction list, you can assign a random number to each transaction. At this point, you should copy this new column then paste as values so that the numbers won't change. Now sort the list of transactions based on these random values. You can take the first n rows to create a random subset of transactions.

The Hard Method

Now, to answer the question directly, some formula magic is required. In fact, an array formula is needed to create a unique list of random transaction numbers. To start with, we need a lower and upper boundary of transaction numbers. Type “Lower Bound” in cell C1 and 70 in cell D1. Type “Upper Bound” in cell C2 and 79 in cell D2.

Place a label in cell A1 that says "Unique Random Numbers". Now, in cell A2, type the following formula: =RANDBETWEEN(D1,D2) then press enter. This will generate a random number without decimal places from 70 to 79. Pressing the F9 key or changing a cell on the worksheet will generate another random number within the transaction number boundaries.









Now, an array formula needs to be written in cell A3 that respects the transaction number range, is aware of what is in the cell above, and chooses a random number from the available remaining numbers. The magic formula is:
 =LARGE(ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))),RANDBETWEEN(1,($D$2-$D$1)-ROW(A3)+3))
When entering the formula, make sure to hold down Ctrl+Shift while pressing the Enter key. This is a Ctrl+Shift+Enter array formula.

To understand the formula better, it should be explained step-by-step, so start with:
ROW(INDIRECT($D$1&":"&$D$2)).
When the numbers from D1 and D2 are placed in this formula, the result is each number from 70 to 79 in an array: {70,71,72,73,74,75,76,77,78,79}. We want to select a random number from this array, but not the number that appears above in cell A2, so this array needs to be multiplied by an array of 1’s and 0’s.

COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2))) is the next part of the formula and is tricky to understand. The expandable range to search is A$2:A2, which contains one random number between 70 and 79. The second part of the COUNTIF performs a search for all 10 numbers between 70 and 79. Only one of these 10 numbers can be found in cell A2. If A2 contains the number 79, then this COUNTIF formula will return an array of counts that look like {0,0,0,0,0,0,0,0,0,1}. This array shows us what has been eliminated by the previously generated random number, but we want to know which numbers are still available for selection, so put this COUNTIF inside a NOT function to reverse the logic and switch the 0’s and 1’s to {1,1,1,1,1,1,1,1,1,0}.

=ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))) multiplies the two arrays together. In this example, it would look like {70,71,72,73,74,75,76,77,78,0} because 79 is already selected in cell A2. The next step is to select the first or second or third or maybe the ninth largest number in this array. Place this formula inside a LARGE function to choose, let’s say, the fourth largest number from the remaining numbers: =LARGE(ROW(INDIRECT($D$1&":"&$D$2))*NOT(COUNTIF(A$2:A2,ROW(INDIRECT($D$1&":"&$D$2)))),4)
To get the LARGE formula working, make sure to hold down Ctrl+Shift while pressing the Enter key.

Of course we don’t just want the fourth largest number, we want to randomly choose from the 9 remaining numbers. Replace the 4 with RANDBETWEEN(1,9). Make sure to press Ctrl+Shift+Enter. This formula works well for cell A3, but when it gets copied down to cell A4, there will only be 8 numbers to choose from, so the RANDBETWEEN(1,9) needs to change to RANDBETWEEN(1,($D$2-$D$1)-ROW(A3)+3). The 9 has been replaced with a formula that calculates the difference between the upper and lower bound then subtracts the current row and adds back 3. Ctrl+Shift+Enter this formula.

If you want to follow the action inside the formula using Excel, select cell A3 then click on the Formulas tab and the Evaluate Formula button. Next, press the Evaluate button repeatedly to show the step-by-step calculation.

With cell A3 selected, copy the cell then select A4:A7 and paste the formula. The $ symbols were carefully placed in the formula to make sure that the formula works when copied down. You should always have six unique numbers between 70 and 79. You can change which six numbers appear by pressing the F9 key.












Once you are confident that there will never be a duplicate number, change the values in cells D1 and D2 to a range of transaction numbers. Also, copy and paste the formula in cell A3 down for as many unique random numbers as you want.

If you have any questions for future articles, would like to know more about using Excel, Access Databases, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro



Saturday, November 01, 2014

Dynamic Ranges, Volume 2 - Drop-Down Menus That Change Based On Selections

Last month we looked at an example of dynamic ranges using a chart that updates automatically based on how much data is available. Since dynamic ranges are of great importance in data modelling and analysis tool-building, I want to share another example.

I have assisted many retailers with data models that involve product hierarchies. One of the most common questions I have heard is how to allow a user to make a selection in one drop-down menu and then have a second drop-down menu change to reflect allowable selections.

In this example, a fashion retailer has three departments (Women, Men and Kids) and within each are categories. Let’s take a look at the data:















There are 8 categories of merchandise in the Women’s department, 6 categories in Men’s and 5 categories in Kids.

The goal is to create two drop-down menus. The first menu allows a user to select one of the three departments and the second menu will automatically allow category selections from the chosen department.

The Department Drop-Down Menu

Excel has a great feature for validating data input called Data Validation. This button is located on the Data tab of the ribbon. If you place the selection in cell H2 then click on Data Validation, the Data Validation dialog box will appear. On the Settings tab is an Allow: drop-down and it should be set to List in order to create a list of allowable departments. Next, the Source textbox needs to know where to get the values for the list, so select the cell range =$A$1:$C$1. Click OK and now cell H2 is a drop-down menu that only accepts one of the three department names. Type “Department” in cell G2 as a label for the drop-down menu.


















Two Important Things To Know

In order simplify this example, now is a good time to capture two pieces of information. The selected department appears in which column? How many categories exist for the selected department?

To figure out which column contains the selected department information, type the following function in cell E2: =MATCH(H2,A1:C1,0). This function will lookup the selected department in cell H2 and then search the range A1:C1 to find the value and return its position in the range. The third argument is a zero and it asks for an exact match. If you select the Kids department in cell H2, then cell E2 should return a 3.

To count the number of categories in the selected department, type the following function in cell E3: =COUNTA(INDEX(A2:C9,0,E2)). This function will look at the A2:C9 range then return all rows (the number 0 represents all rows) for the column number in cell E2. For this dynamic cell range, the COUNTA function will count how many cells contain a category value. If you select the Men’s department, then cell E3 should return a 6.



The Category Drop-Down Menu

The final task is to create a category drop-down menu that understands which department was selected. Select cell H3 so that another data validation list will appear there. Click on Data Validation and from the Settings tab allow another List. The Source this time will not be a fixed range such as =$A$1:$C$1 but will be a dynamic range. An OFFSET function can be typed into the Source text box. Type in the following: =OFFSET($A$1,1,$E$2-1,$E$3,1). This function starts at cell A1, then moves down one row, then moves to the right E2-1 columns. Once at this cell, the dynamic range will contain the number of rows from E3 with 1 column of width. If you make a selection from the category drop-down menu, you should only be able to choose categories within the previously selected department. Add a label in cell G3 for “Category”.

When you test the two drop-downs, the finished example should look like this:














If you have any questions for future articles, would like to know more about using Excel, Access Databases, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro



Saturday, October 04, 2014

Dynamic Ranges, Volume 1 - Charts That Automatically Grow When Data Is Added

In analytics, dynamic ranges are important when working in Excel because they give a data model the flexibility to adjust to changes such as adding or deleting data.

It is generally frowned upon in Excel data models to hard-code values; it is good practice to refer to cells that contain input values so that the model can be easily updated by changing the input-cell values.

However, there is another type of hardcoding happening in Excel models – the static range. Most often, it is fine to type in a formula such as =SUM(B3:B14), but if you want the range to expand with new data in cell B15, you need a range that is dynamic.

A good introduction to dynamic ranges is the example of a chart that requires monthly updates. Here is some monthly sales data to create a Clustered Column chart.














If you create a chart based on the data in cells A1:C5, it may look like the one pictured below. However, when you add data in row 6 for the month of May, the chart won't update by itself. Some may choose to manually update their ranges on a monthly (perhaps weekly or daily) basis, but this can be automated by setting dynamic ranges that grow on their own.




















Excel Tables

An underused feature of Excel is the Excel Table. A range of spreadsheet data can be converted to a table by simply selecting a cell in the data then clicking on the Insert tab - Tables group - Table button (Ctrl-T for my keyboard shortcut friends).

The data remains unchanged but now that it is in a table format, one of the newly granted abilities is for a table to automatically grow or shrink. If you add data in row 6 for May, the chart will automatically expand its range to accommodate the new month.














Named Ranges And The Offset Function

A more advanced and powerful way to set up a dynamic range is to use named ranges. We can create a named range that refers to a cell range that can change. This example begins with the January to April dataset on a worksheet called Sales, not the Excel table where May data was just added.

Click on the Formulas tab - Defined Names group - Name Manager button to open up the Name Manager dialog box. Create a new named range by clicking on the New button. The name of the range will by TYSales and instead of hardcoding a reference, we can use the OFFSET function to refer to a dynamic range. Type the following in the Refers to box: =OFFSET(Sales!$B$1,1,0,COUNT(Sales!$B:$B),1)

The five arguments in the OFFSET Function return the range B2:B5 using the following logic: start at cell B1 on the Sales worksheet, go down 1 row, go to the right 0 columns, make the height of the range equal to the count of numbers in column B, make the width of the range equal to 1 cell wide.

Once the first named range is complete, create a named range called LYSales where the reference is defined by =OFFSET(Sales!$C$1,1,0,COUNT(Sales!$C:$C),1) and another named range called Labels where the reference is defined by =OFFSET(Sales!$A$1,1,0,COUNT(Sales!$C:$C),1).

The final step will be to let a new chart know that these three named ranges, TYSales, LYSales and Labels will supply the data.

Create a new Clustered Column chart then click on the Design Tab – Data Group - Select Data button. Add a data series with a Series Name of ="TY Sales" and Series Values of =Sales!TYSales. The equals symbol is followed by the name of the worksheet, an exclamation point then the named range. Once the series is added, edit the Horizontal (Category) Axis Labels by setting the Axis Label Range to =Sales!Labels. Finally, add one more data series for last year sales. The Series Name is =”LY Sales” and the Series Values should be set to =Sales!LYSales.




















The chart is complete and uses dynamic ranges that respond to how much data is provided. Add data in rows 6 through 8 for May, June and July and watch the chart change as you complete each cell. The chart will also shrink if you delete the row at the bottom of the data multiple times.

If you have any questions for future articles, would like to know more about using Excel, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro


Wednesday, September 10, 2014

Optimization Of Resources - Using Excel Solver To Aid Decision Making

When an Excel data model is created and some cells require input from the user, the scenario outcome can change for better or worse. Here is a simple example of an organic baked goods production model where there are constrained resources available to make apple pies, boxes of cookies and cinnamon buns.

The limited available organic ingredients for tomorrow’s production is 350 cups of sugar, 450 cups of flour and 400 cups of butter.

An apple pie requires 1 cup of sugar, 0.5 cups of flour and 1 cup of butter. A minimum of 10 pies must be made to fill the shelf and each pie produces $1.00 of profit.

A box of cookies requires 2 cups of sugar, 4 cups of flour and 1 cup of butter. A minimum of 30 boxes must be produced and each box yields $5.00 of profit.

A box of cinnamon buns requires 1 cup of sugar, 3 cups of flour and 2 cups of butter. A minimum of 15 boxes is required and each will produce $4.00 in profit.

Setting Up The Model
















The decision to be made is how many apple pies, boxes of cookies and cinnamon rolls to produce. These decisions are input by the user in the yellow cells C2:E2. In order to respect the minimum presentation requirement, these production numbers should be at least as big as the values in cells C4:E4. Cells C6:E8 detail how many cups of each organic ingredient are required to produce one unit of each baked good. The values in cells C10:E12 multiply the per-unit ingredient requirements by the number of units to produce in cells C2:E2 (40 boxes of cookies require 80 cups of sugar, 160 cups of flour and 80 cups of butter). Cells C14:E14 displays the per-unit profit of each baked good and C15:E15 multiplies these values by the number of units produced. Since there are only 350 cups of sugar available, this maximum is displayed in cell I10 alongside the total sugar consumed by the three products in G10. Cells G11:I12 display the cups of flour and butter used as well as the cups available.

The Initial Solutions

With a production schedule of 20 apple pies, 40 boxes of cookies and 30 cinnamon rolls, the presentation requirements are met, the three constrained ingredients are available and $340 of profit is earned.

Of course, more profit could be earned if additional baked goods are produced, but how many units of each should be added to yield the greatest profit? Since cookies have the highest profit per unit, we can change the production to 70 units. Cinnamon rolls also produce a good profit but use less flour, maybe we should increase production to 50 units. At this point we are constrained on flour and can only make a total of 40 apple pies. All of the flour has been used up but sugar and butter are still available while the profit earned is $590. Is there another solution that makes better use of the ingredients and yields more profit?

Excel Solver

This is a type of problem that is difficult to solve using trial and error but simple with Excel’s Solver Add-In. The Solver button appears in the Data tab of Excel’s ribbon. If you don’t see the Solver button, you need to enable the Add-In. Click on the File Tab then select Options and when the Excel Options dialog appears, select the Add-Ins tab. Click on the Go button to bring up a menu of Add-Ins that can be enabled or disabled, check the box for the Solver Add-In then click on OK. The Solver Add-In should be added to the Data tab in Excel, if not then quit Excel and open the Excel application again.

The Optimal Production Schedule

Click on the Solver button to set up the optimization model. The Set Objective box should be cell G15 and the To button should be Max because we want to maximize the value in cell G15, the total profit. The By Changing Variable Cells box should be cells C2:E2, the decision input cells that can
change their values. Next, click the Add button to make the Add Constraint dialog box appear. The Cell Reference of C2:E2 should be set to greater than or equal to the Constraint of C4:E4. Notice that three constraints are being specified at once instead of one cell at a time. Click Add to add another constraint. The next constraint is that G10:G12 should be less than or equal to I10:I12. The last constraint is the fact that C2:E2 must be integers; we can’t create half of a pie. In order to set this constraint, the Cell Reference is C2:E2 and the drop-down menu is set to “int” (which will then place the word “integer” in the Constraint box). Click OK to close the Add Constraint dialog box. The Solver model is ready to choose the optimal solution, so click the Solve button and when the solution is ready, click OK to return to the model.

The Optimal Solution
















Excel Solver has found an interesting solution where 216 apple pies, 30 boxes of cookies and 74 cinnamon rolls are made. This solution makes good use of the ingredients as only 6 cups of butter will remain. The profit of this solution is $662, higher than what we could likely produce with trial and error. It seems counterintuitive that production of the highest per-unit profit item (cookies) is low but that makes sense considering that cookies use more flour than the other baked goods. If making 216 apple pies is unreasonable and risks spoilage, a new constraint could be added to this model that specifies a maximum level of production for each baked good.

Whether you need to allocate different products to stores at varying prices or allocate hours to employees with varying wage costs and sales performance, models such as this one can be created to maximize profits or minimize costs given a set of constraints.

If you have any questions for future articles, would like to know more about using Solver, Excel, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro


Saturday, July 19, 2014

SELECT FROM WHERE? - SQL for Business Professionals

The language of databases is SQL (pronounced Ess-Cue-Elle) – Structured Query Language. Being able to have a basic conversation in this language allows you to engage with the world’s databases and those who create and maintain them. Every year, more of the world’s business data is stored in tables. To discover the patterns and insights buried in the data, queries can be written in SQL to ask a question that the database will answer.

Once the exclusive domain of Database Developers, Database Administrators and Information Technology Professionals, SQL is now being used by more business professionals to have a competitive edge and direct access to databases. Direct access allows data analysis to occur organically as business questions are answered immediately, probably generating deeper questions.

I used to write SQL statements in the 1990’s and the great thing about SQL is that those same statements still work today. SQL is a very stable language across time, which is welcome when technology is constantly changing. Also, since SQL is also the most widely used method of communicating with databases, the same SQL statement can often work across different databases (Microsoft Access, SQL Server, Oracle…).. However, in more complex queries, small changes in syntax may be required.

When there is a lot of data, Excel won’t be nearly as fast as SQL at manipulating and summarizing data to yield insights. Excel is better for working with the results of a SQL query.

The best part about SQL is that it is really easy to learn. A sample query will illustrate the basic syntax of asking “How many recent orders contain more than one product, can you show me those orders?”

SELECT OrderID, COUNT(ProductID), SUM(UnitQuantity)
FROM tblOrderDetails
WHERE OrderID>=17239
GROUP BY OrderID
HAVING COUNT(ProductID)>1
ORDER BY SUM(UnitQuantity) DESC

The FROM Clause

Even though the SELECT clause is written first, we start with the FROM clause to understand the query. The FROM clause indicates where the data is coming from. In this case it is a table called “tblOrderDetails” which contains data about each product on each order a company fulfills. The picture below shows some sample records from the table. Each row in this table represents one product from one order and will contain data elements such as: Order Identification Number, Product Ordered, Unit Quantity of Product, Extended Cost of Units and Extended Price of Units. More complex queries will combine data from multiple tables.
















The WHERE Clause

Once we know where the data is coming from, we can filter the individual records to exclude orders we don’t want and include orders we do. In this query we want the most recent orders; those with an order identification number of 17239 or greater. All the records with lower order identification numbers are discarded at this point.

The GROUP BY Clause

Now that only some of the rows have survived to this point, it is time to summarize these records. I think that GROUP BY is the most powerful clause in SQL because it has the power to summarize many records into a total. In this query, we want to create a summary group for each order identification number. Order number 17239 has three products ordered and therefore three records in the table. These three records will be aggregated to one row at the order level so that we can later count the number of products (3) and sum the unit quantity (14).

The HAVING Clause

The HAVING clause is similar to the WHERE clause, it is a filter. The difference is that the WHERE clause was fed individual records from a table and HAVING is fed summarized groups (one row per order). HAVING occurs after the GROUP BY clause, so at this point, summary groups are created and we can filter out summary groups that we want to exclude. In this query, we want to keep the orders that have more than one product ordered. Any groups with only one product ordered are now discarded from the result (such as order numbers 17240 and 17241).

The ORDER BY Clause

Now that we know which groups will appear in the final result, we can present them with a certain sort order. In this query, we are going to perform a descending sort on the sum of unit quantity for each order. This means that the order with the most units will appear as the first row and the order with the fewest units will appear at the bottom.

The SELECT Clause

Now that the groups are created and sorted, we can choose which columns will be returned in the result. In this query we want to see the order identification number, followed by the count of the number of products and the sum of the units for that order.

Upcoming Course

If you would like to know more about using SQL to interact with databases, I teach a one-day course called: Introduction to SQL for Finance Professionals. The course goes into much greater detail and will have you writing SQL and returning results from databases. Join me at the Certified Management Accountants Professional Development Institute.

If you are interested in learning about Excel, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro


Saturday, June 14, 2014

Creating Excel Charts Using VBA - A Picture Is Worth 1,000 Words

Most people struggle to see the patterns and relationships present when looking at a table of data. For example, take a look at this table which details the deposits, branches and employees for some Canadian Banks. Without spending too much time scrutinizing the numbers, can you tell which banks are the most similar?















Using A Chart To Visualize The Data

If you want to effectively communicate a message to an audience, a chart can let the data quickly speak for itself. A basic Excel scatter chart with branches on the vertical axis and employees on the horizontal axis will give you a better idea of which banks are most similar. Unfortunately, we can’t easily tell which bank is which and deposits have not been reflected in the chart.



Formatting the chart as a bubble chart will make the visualization more presentable. This chart not only shows the number of branches and employees for each bank, but the size of the logo indicates the magnitude of the deposits. This chart quickly conveys the relationships and patterns of three variables. Now you can group similar banks together instantly.



Building Charts With Excel Visual Basic For Applications (VBA)

This chart was built using VBA but it could have been created manually. For this particular chart, writing the code and building the chart manually may take about the same amount of time, but coding offers advantages. First, the code can be reused to create similar charts without having to rebuild the chart manually. Second, there were only 11 points on the chart, but if there were more points, it would take too long to build manually.

Animating Benchmarking Charts

One of the more popular charts I have offered clients is the Benchmarking Chart. It is a good visualization for comparing a company or employee to peers along several metrics. This chart can be created manually or using VBA. However, if you want to see this chart progress from period 1 through 12 automatically, you will need to create a motion chart using VBA.

Upcoming Course

If you would like to know more about using Excel VBA code to create charts, I will be teaching a one-day course called: Microsoft Excel VBA – Focus on Charts & Visualizations. Join me at the Certified Management Accountants Professional Development Institute. The charts above will be covered in the course and will give you the foundation to invent new charts to communicate your messages.

To give you an idea of what is possible after practicing Excel charting with VBA, here are some examples of complex charts which are only feasible with VBA.





If you are interested in learning more about Excel charts or VBA, need a data model, a database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro


Monday, May 19, 2014

Why is my VLOOKUP not working? - Four Opportunities To Improve Your VLOOKUP

This past week at a client site, I was asked a question that I have heard numerous times: Why is my VLOOKUP not working? I have seen issues with all of the four arguments required by this function and each presents an opportunity to improve your data model.

lookup_value

I will keep the data set simple: population and median income for each city in the United States. This data set contains 503 cities.

Elsewhere in this spreadsheet, we need to lookup the population and median income of two cities: Buffalo and Hartford.

The text “Buffalo” is located in cell L2 and the VLOOKUP function to lookup the population will be written in cell M2.

The formula can be written as:
=VLOOKUP(L2,A2:C504,2,FALSE)

The first argument in the function is the lookup_value and represents what we want to look for. In this case we are looking for “Buffalo” which is in cell L2.

The second argument is table_array and represents the data source for retrieving the population of Buffalo. The table_array is in cells A2 to C504.

The third argument is col_index_num and represents which column in the table_array to return as our result. The population is in the second column, so we specify a value of 2.

The fourth argument is range_lookup and accepts values of TRUE (1) or FALSE (0). FALSE is “exact match” and only allows an exact match of Buffalo to return a result. TRUE is “approximate match” and will return the closest value to Buffalo if the table_array is sorted in the first column (City, from A to Z).

Here is the result of the formula, an #N/A error. The problem is that in cell L2, there is a space character after “Buffalo”. This is a common problem that is harder to find when many VLOOKUP functions are written. An easy way to resolve this issue in the lookup_value argument is with the TRIM function, which will remove all spaces from a string except for single spaces between words. The formula should be rewritten as:
=VLOOKUP(TRIM(L2),A2:C504,2,FALSE)


table_array

This formula in cell M2 can now be copied and pasted down one cell to M3. The resulting formula is:
=VLOOKUP(TRIM(L3),A3:C505,2,FALSE)

This formula returns the correct result for Hartford, but the table_array is no longer correct. The data source is now cells A3 to C505. New York is no longer in the table_array and every time this formula gets pasted one row further down, another city will be removed. To improve the table_array argument, add $ symbols to make it an absolute reference.
=VLOOKUP(TRIM(L3),$A$2:$C$504,2,FALSE)

Now the table_array will always remain the same. Unfortunately, if you add data to the bottom of the table starting in row 505, it won’t be visible to the function. Therefore, another way to write this formula is by using entire columns instead of specifying rows:
=VLOOKUP(TRIM(L3),$A:$C,2,FALSE)

This formula will copy and paste downwards without losing any data and also allows the data to grow.

col_index_num

The third argument is typically hardcoded as a number, in our case the second column is represented by a 2. There are two common issues created by hardcoding column references.

The first issue is what happens when you insert a column inside your table_array cell range. The State for each city has been added to the data and the Population is no longer in column 2, it is now in column 3. Excel does not update the VLOOKUP function because the col_index_num argument has been hardcoded. The Population lookup is now returning State instead because that is what appears in column 2.







The second issue is what happens when you copy and paste the formula in cell M2 to the right. If we want to add a VLOOKUP of Median Income in column N then we should add a $ symbol to the lookup_value ($L2):
=VLOOKUP(TRIM($L2),$A:$C,2,FALSE).


When this formula is copied and pasted to the right, nothing in the formula changes, so we have to manually change the col_index_num to 3. If we had more columns in our table_array such as Crime Rate or Unemployment Rate, we would copy and paste this formula to the right two more times and have to manually enter the col_index_num. For dozens of columns, this takes too long and is a risk for errors.

To fortify the col_index_num argument and solve both of these issues, use the COLUMNS function, which will count the number of columns in the cell range it is fed.

Replace the hardcoded value of 2 in the col_index_num argument with COLUMNS($A2:B2) which will evaluate to 2 (the number of columns in the range $A2:B2). The lookup in cell M2 for Buffalo’s population should be: =VLOOKUP(TRIM($L2),$A:$C,COLUMNS($A2:B2),FALSE)

When this formula is copied and pasted to the right, it becomes
=VLOOKUP(TRIM($L2),$A:$C,COLUMNS($A2:C2),FALSE) and the col_index_num now evaluates to 3. The range expands when pasted to the right. Furthermore, when the State column is inserted into the table_array, the col_index_num will also expand and return the correct result.

range_lookup

For just a few VLOOKUP functions, I would recommend using exact match (FALSE or 0) for the range_lookup argument because successful lookups will return the correct result and if the city is not found, an #N/A error is returned. However, if there are thousands of VLOOKUP functions in a data model and the table_array has a large number of rows, it will take a long time to recalculate the workbook. A significant performance boost is achieved by changing the range_lookup argument to approximate match (TRUE or 1). In order to do this, you must sort your table_array by the first column (City) and be sure that Buffalo appears in the table, otherwise you will return the population of the closest alphabetical match to Buffalo which may be Boston.

If you are interested in learning more about strengthening your data models, need a database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro