Friday, October 25, 2013

Transposition of ranges in SQL and charts

I had this database table containing ranges (a start value and an end value). The challenge was creating a query that overlaps and transposes those ranges so I can say how many ranges are at any point in the total interval or values. As an example, "SELECT * FROM Ranges" would result in a table like:
StartEnd
1020
1030
2535
2040
and I am looking for something like this:
ValueCount
00
10
......
102
112
......
242
253
263

A naive implementation would get the minimum Start (or start with 0, as I did) and the maximum End, create an in memory or temporary table (Values) from min to max using an ugly WHILE block, then join it with the Ranges tables something like:
SELECT v.Val,Count(1) as Nr
FROM #Values v
INNER JOIN Ranges r
ON r.Start<=v AND r.[End]>=v

This kind of works, but for large ranges it becomes difficult. It takes a lot to create the Values table and the join and for extreme cases, like I had with values from 0 to 6 billion, it becomes impossible. The bottleneck here is this Values table, which is pretty much a horror to create and maintain. But what if you don't need all the values?

Before I tell you the solution I found, be warned that you have to properly define what a range is. Is a range 10-20 actually 10-19? In my case it was so, so that is why there are some subtractions with 1 or less than rather than less or equal conditions.

The solution is this:
SELECT DISTINCT Val
INTO #Values
FROM (
  SELECT 0 as Val
    UNION ALL
  SELECT Start FROM Ranges
    UNION ALL
  SELECT [End]-1 FROM Ranges
) x
ORDER BY Val

The idea is that after you compute the ranges count per each of the start and end values you know that between one and the next the count of ranges will remain the same. The join is significantly faster, there is no ugly WHILE block and you don't need a 6 billion value table. It's easier to plot on a chart as well, with either of these variations:
See the variations:
SELECT v.Val,
       Count(r.Start) as Nr
FROM #Values v
LEFT JOIN Ranges r
  ON r.Start<=v.Val AND r.[End]>v.Val
GROUP BY v.Val

The result of the query above will be:
ValueNr
00
102
192
202
253
293
342
391
Hope this helps you

0 comments: