Using SQL PIVOT and UNPIVOT
I learned about a pair of SQL keywords I didn’t know existed before. I suppose I shouldn’t have been so surprised
this ability existed in SQL since I knew it existed within Excel spreadsheets, but it really saved the day this
last week. I’m speaking of the PIVOT
and UNPIVOT
SQL keywords.
This last week I needed to track down some data inconsistencies in our database by joining a few view model tables. These tables are populated by event handlers that denormalize the relevant data from events into a view model table that serves the needs of a specific part of the UI. In one particular view model table, it made sense to arrange the data points specific to each US state as columns on a single row. This allowed for a grid-based UI that enables batch operations on many rows at one time. A second view model table was intended for a different use case and was designed so that the state-specific data points became a one-to-many relationship across multiple rows and only two columns for the key-value pair of the US state and its associated data point value. These were two of the tables I needed to join together during my database inquisition.
At this point I figured I was stuck, as I couldn’t see a sensible way to join two tables designed so contrary to
each other for the purposes of joining. I tried to think of any ways to dynamically join tables on a column name
in one table specified by the value in the other table’s row. I knew I was heading nowhere fast, so I asked our
DBA for some ideas. He informed me of the PIVOT
and UNPIVOT
keywords in SQL as one option to solve the problem.
We reasoned that even though it may not be the most efficient feature, the likelihood of it proving to be a
performance bottleneck would be essentially non-existent (given that the entire size of the data set was relatively
small, on the magnitude of hundreds or thousands of rows instead of millions of rows). And so I gave it a go.
I will now describe an example scenario in order to illustrate how I used the UNPIVOT
feature to join two tables
that didn’t seem at all compatible. The goal will be to join a table representing what US states each product is
available for sale within to a table representing how many of each product are being sold in each state. We will
call the first table ProductAvailability
and give it an integer identity column named Id
, an integer column
named ProductId
, and then 52 bit
columns indicating the availability status of the product in each of the 50 US
states, the District of Columbia, and Puerto Rico. These columns will be named by their two letter abbreviations of
AK
through WY
. The second table we will call ProductSales
and it will also have an integer identity column
named Id
and an integer column named ProductId
. In addition it will have a string column (specifically
char(2)
) named StateCode
and an integer column named TotalSales
.
The unpivoting of the ProductAvailability
table to prepare it for joining can be done like so:
SELECT Unpivoted.ProductId, Unpivoted.StateCode, Unpivoted.IsAvailable FROM ProductAvailability UNPIVOT ( IsAvailable FOR StateCode IN (AK,AL,AR,AZ,CA,CO,CT, DC,DE,FL,GA,HI,IA,ID, IL,IN,KS,KY,LA,MA,MD, ME,MI,MN,MO,MS,MT,NC, ND,NE,NH,NJ,NM,NV,NY, OH,OK,OR,PA,PR,RI,SC, SD,TN,TX,UT,VA,VT,WA, WI,WV,WY) ) AS Unpivoted
The interesting part is obviously the UNPIVOT
clause. Here we are declaring a list of column names to unpivot and
also naming two new columns in which to place the old column’s name and value (StateCode
and IsAvailable
respectively). The result is that instead of having one row of 54 columns for each product, we now have 52 rows of
three columns to describe the exact same availability of the product. But with this second schema, we can more
easily join with the ProductSales
table.
Here is an example of such a query where we are trying to find the products which have no sales in a state for which it is marked as available for sale:
SELECT ProductStateAvailability.ProductId, ProductStateAvailability.StateCode FROM ( SELECT Unpivoted.ProductId, Unpivoted.StateCode, Unpivoted.IsAvailable FROM ProductAvailability UNPIVOT ( IsAvailable FOR StateCode IN (AK,AL,AR,AZ,CA,CO,CT, DC,DE,FL,GA,HI,IA,ID, IL,IN,KS,KY,LA,MA,MD, ME,MI,MN,MO,MS,MT,NC, ND,NE,NH,NJ,NM,NV,NY, OH,OK,OR,PA,PR,RI,SC, SD,TN,TX,UT,VA,VT,WA, WI,WV,WY) ) AS Unpivoted WHERE Unpivoted.IsAvailable = 1 ) AS ProductStateAvailability INNER JOIN ProductSales ON ProductSales.StateCode = ProductStateAvailability.StateCode AND ProductSales.ProductId = ProductStateAvailability.ProductId WHERE ProductSales.TotalSales = 0
I hope you find the PIVOT
and UNPIVOT
SQL keywords just as helpful as I did if you ever find yourself in a bind
with two tables that are seemingly impossible to join.
This would also be a great intro to behaviors for the fubumvc guides.