mandag den 29. juli 2013

SQL Server: Common Table Expressions

One nice thing I stumbled upon reading the "Querying Microsoft SQL Server 2012" book is a feature called "Commen Table Expressions" or CTE's.

As it turns out the feature was already available in SQL Server 2005. If other people than me managed to be ignorant of its existence then let me try to explain what it is about.

The basic form of a CTE is this:
WITH <CTE_name> AS
(
   <inner_query>
)
<outer_query>

This form allows us to assign a name (CTE_name in the above) to a query and then use that in the <outer_query>.

Of course, this is just different kind of view or subquery (derived table). It allows one to avoid creating permanent database objects (as with views). It it also arguably more readable than named subqueries and - more importantly - avoids the need to repeat the definition of a subquery as is sometimes necessary.

Consider this example:
select ...
from (<subquery>) T1
inner join (<subquery>) T2 on ...

When self-joining the <subquery> one must repeat the defintion of the <subquery>, which is of course redundant.

For any functional programmers out there, CTE's play the role of let-expressions.

Recursive CTE

Nice as it is, the above form of CTE is mostly syntactical sugar, but there is another form of CTE that does something entirely different and makes it easier and faster to deal with hierachies - something that has always been notoriously difficult in SQL.

Enter the recursive CTE:
WITH <CTE_name>
AS
(
   SELECT col_1, ..., col_n
   FROM <table_expression_1>

   UNION ALL

   SELECT col_1, ..., col_n
   FROM <table_expression_2>
)
<outer_query>
This is interesting because the <table_expression_2> is allowed to refer to CTE_name.What happens is that SQLServer keeps evauating the <table_expression_2> and append rows to the final CTE until <table_expression_2> evaluates to an empty set. References to name of the CTE (CTE_name in this example) in table_expression_2 referes to the previous iteration.

EXAMPLE

Consider a table with some sort of hieracy (eg. a folder structure):
Id Parent_id
1 null
2 1
3 1
4 3
Say we want to retrieve the node 4 and all the parent nodes all the way to the root. Using "ordinary"< SQL we can only retrieve a constant number of leves (using the appropriate number of self-joins), but retriving an arbitrary number of levels is a challenge.

An elegant solution using recursive CTE's is:
with CTE_nodes as
(
   select n.id, n.parent_id
   from nodes n
   where n.id=4

   union all

   --- this is the "magic part"
   from CTE_nodes n2              
   inner join nodes n on n.id=n2.parent_id
)
select id, parent_id from CTE_nodes   
The result is:
id parent_id
1null
31
43

The obvious use case is retrieving data from hierarchical tables, but it seems very exciting to have recursive querying capabilities. This has been also been possible using user defined functions, but I have more confidence that the optimzer will be able to handle recursive CTE's more efficiently than recursive user defined functions. I would be happy to hear about real world experience with recursive CTE's.

Ingen kommentarer:

Send en kommentar