Saturday, July 19, 2014

SELECT FROM WHERE? - SQL for Business Professionals

The language of databases is SQL (pronounced Ess-Cue-Elle) – Structured Query Language. Being able to have a basic conversation in this language allows you to engage with the world’s databases and those who create and maintain them. Every year, more of the world’s business data is stored in tables. To discover the patterns and insights buried in the data, queries can be written in SQL to ask a question that the database will answer.

Once the exclusive domain of Database Developers, Database Administrators and Information Technology Professionals, SQL is now being used by more business professionals to have a competitive edge and direct access to databases. Direct access allows data analysis to occur organically as business questions are answered immediately, probably generating deeper questions.

I used to write SQL statements in the 1990’s and the great thing about SQL is that those same statements still work today. SQL is a very stable language across time, which is welcome when technology is constantly changing. Also, since SQL is also the most widely used method of communicating with databases, the same SQL statement can often work across different databases (Microsoft Access, SQL Server, Oracle…).. However, in more complex queries, small changes in syntax may be required.

When there is a lot of data, Excel won’t be nearly as fast as SQL at manipulating and summarizing data to yield insights. Excel is better for working with the results of a SQL query.

The best part about SQL is that it is really easy to learn. A sample query will illustrate the basic syntax of asking “How many recent orders contain more than one product, can you show me those orders?”

SELECT OrderID, COUNT(ProductID), SUM(UnitQuantity)
FROM tblOrderDetails
WHERE OrderID>=17239
GROUP BY OrderID
HAVING COUNT(ProductID)>1
ORDER BY SUM(UnitQuantity) DESC

The FROM Clause

Even though the SELECT clause is written first, we start with the FROM clause to understand the query. The FROM clause indicates where the data is coming from. In this case it is a table called “tblOrderDetails” which contains data about each product on each order a company fulfills. The picture below shows some sample records from the table. Each row in this table represents one product from one order and will contain data elements such as: Order Identification Number, Product Ordered, Unit Quantity of Product, Extended Cost of Units and Extended Price of Units. More complex queries will combine data from multiple tables.
















The WHERE Clause

Once we know where the data is coming from, we can filter the individual records to exclude orders we don’t want and include orders we do. In this query we want the most recent orders; those with an order identification number of 17239 or greater. All the records with lower order identification numbers are discarded at this point.

The GROUP BY Clause

Now that only some of the rows have survived to this point, it is time to summarize these records. I think that GROUP BY is the most powerful clause in SQL because it has the power to summarize many records into a total. In this query, we want to create a summary group for each order identification number. Order number 17239 has three products ordered and therefore three records in the table. These three records will be aggregated to one row at the order level so that we can later count the number of products (3) and sum the unit quantity (14).

The HAVING Clause

The HAVING clause is similar to the WHERE clause, it is a filter. The difference is that the WHERE clause was fed individual records from a table and HAVING is fed summarized groups (one row per order). HAVING occurs after the GROUP BY clause, so at this point, summary groups are created and we can filter out summary groups that we want to exclude. In this query, we want to keep the orders that have more than one product ordered. Any groups with only one product ordered are now discarded from the result (such as order numbers 17240 and 17241).

The ORDER BY Clause

Now that we know which groups will appear in the final result, we can present them with a certain sort order. In this query, we are going to perform a descending sort on the sum of unit quantity for each order. This means that the order with the most units will appear as the first row and the order with the fewest units will appear at the bottom.

The SELECT Clause

Now that the groups are created and sorted, we can choose which columns will be returned in the result. In this query we want to see the order identification number, followed by the count of the number of products and the sum of the units for that order.

Upcoming Course

If you would like to know more about using SQL to interact with databases, I teach a one-day course called: Introduction to SQL for Finance Professionals. The course goes into much greater detail and will have you writing SQL and returning results from databases. Join me at the Certified Management Accountants Professional Development Institute.

If you are interested in learning about Excel, VBA, Charts, need a dashboard, data model, database solution or analytics training, please contact Scott Cuthbertson:

Email: scottjcuthbertson@gmail.com
LinkedIn: http://ca.linkedin.com/in/scottcuthbertson
Blog: http://theanalyticspro.blogspot.ca/

Scott Cuthbertson BBA

Scott is an Analytics Professional who has worked with over 100 clients in the retail and financial services industries. He has helped organizations improve their labour scheduling, inventory replenishment, performance management, organization design and pricing. He has been a practitioner of analytics since childhood, using data models to win Hockey Pools. His expertise lies in developing data models, reporting, algorithms and solutions for any organization that wants to use their data to improve decision making. He also teaches at the CMA Professional Development Institute.

© Scott Cuthbertson 2014 Blog: Words From The Analytics Pro


No comments:

Post a Comment