Friday, August 08, 2014

Sending complex objects to T-SQL using the XML type

I have been working on a REST API lately and, while using Entity Framework or some other similar framework to abstract the database is certainly possible, I wanted to control every aspect of the implementation. I know, reinventing wheels, but this is how one learns. One of the most annoying bits was trying to translate some complex object from JSON to the (two dimensional) database relational tables. This post will explore my attempts and the solutions I have found.

My first attempt was straightforward: just send all types as DataTables, with some extra property to define identity and parent entity. This relies on the Microsoft Server SQL mechanism that allows sending of table variables to stored procedures. But this approach has several downsides. One of them is that in order to send a datatable to SQL you need... DataTables. As I have pointed out in several blog posts, the DataTable object is slow, and sometimes downright buggy. Even if I didn't care about performance that much, in order for SQL to receive the content of the DataTable one must create corresponding User Defined Types on the database side. Working with UDTs is very difficult for several reasons: you cannot alter a UDT (unless employing some SQL voodoo that changes system tables), you can only drop it and recreate it. This does not work if you use the UDT anywhere, so a lot of renaming needs to be done. Even if you automate the process, it's still very annoying. Then the UDT definition has to be an exact duplicate of the DataTable definition. Move some columns around and it fails. Debugging is also made difficult by the fact that the SQL profiler does not see the content of table variables when sending them to the server.

Long story short, I was looking for alternatives and I found XML. Now, you might think that this leads to a simple, and maybe even obvious, solution. But it's not that easy. Imagine that you send a list of objects in an XML. Each object is represented by an XML element and each property by a child element. In order to get the value of a property you need to do iterate through all the nodes, for each node find the properties, for each property find the one element that defines it, then get the attribute value or content of the property, all while making sure you select everything in a table. It's not that easy.

The solution I found, which simplifies the SQL code (and hopefully brings some well needed performance to the table) is to serialize the objects in a way that makes the selection simple enough. Here is an example: I have a Configuration object with an Id and a Name that also has a property called Servers, containing Server objects having an Id and a Url. Here is an example of XML serialization from the DataContractSerializer:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration xmlns="http://schemas.datacontract.org/2004/07/SerializationTest" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
   <Id>1</Id>
   <Name>Test Config</Name>
   <Servers>
      <Server>
         <Id>1</Id>
         <Url>http://some.url</Url>
      </Server>
      <Server>
         <Id>2</Id>
         <Url>http://some.other.url</Url>
      </Server>
   </Servers>
</Configuration>
The SQL code to get the information from an XML variable with this content would look like this:
DECLARE @Xml XML='<?xml version="1.0" encoding="UTF-8"?>
<Configuration xmlns="http://schemas.datacontract.org/2004/07/SerializationTest" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
   <Id>1</Id>
   <Name>Test Config</Name>
   <Servers>
      <Server>
         <Id>1</Id>
         <Url>http://some.url</Url>
      </Server>
      <Server>
         <Id>2</Id>
         <Url>http://some.other.url</Url>
      </Server>
   </Servers>
</Configuration>'

;WITH XMLNAMESPACES(DEFAULT 'http://schemas.datacontract.org/2004/07/SerializationTest')
SELECT T.Item.value('(Id/text())[1]','INT')  as Id,
       T.Item.value('(Name/text())[1]','NVARCHAR(100)')  as Name
FROM @Xml.nodes('//Configuration') as T(Item)

;WITH XMLNAMESPACES(DEFAULT 'http://schemas.datacontract.org/2004/07/SerializationTest')
SELECT T.Item.value('(Id/text())[1]','INT')  as Id,
       T.Item.value('(Url/text())[1]','NVARCHAR(100)')  as Url
FROM @Xml.nodes('//Configuration/Servers/Server') as T(Item)

This works, but look at that code. In my case, the situation was worse, the object I was using was a wrapper which implemented IDictionary<string,object> and, even if it did implement ISerializable, both XmlSerializer and DataContractSerializer use the dictionary as their data and in the end I get ugly key elements and value elements that are even harder to get to and, I suppose, more inefficient to parse. Therefore I found the solution in IXmlSerializable, (yet) another serialization interface used exclusively by XML serializer classes. If every simple value would be saved as an attribute and every complex object in an element, then this could be the SQL code:
DECLARE @Xml XML='<?xml version="1.0" encoding="UTF-8"?>
<Configuration xmlns="http://schemas.datacontract.org/2004/07/SerializationTest" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
   <Config Id="1" Name="Test Config">
   <Servers>
      <List>
        <Server Id="1" Url="http://some.url" />
        <Server Id="2" Url="http://some.other.url" />
      </List>
   </Servers>
   </Config>
</Configuration>'

;WITH XMLNAMESPACES(DEFAULT 'http://schemas.datacontract.org/2004/07/SerializationTest')
SELECT T.Item.value('@Id','INT')  as Id,
       T.Item.value('@Name','NVARCHAR(100)')  as Name
FROM @Xml.nodes('//Configuration/Config') as T(Item)

;WITH XMLNAMESPACES(DEFAULT 'http://schemas.datacontract.org/2004/07/SerializationTest')
SELECT T.Item.value('@Id','INT')  as Id,
       T.Item.value('@Url','NVARCHAR(100)')  as Url
FROM @Xml.nodes('//Configuration/Config/Servers/List/Server') as T(Item)

Much easier to read and hopefully to parse.

I am not going to write here about the actual implementation of IXmlSerializable. There are plenty of tutorials on the Internet about that. It's not pretty, :) but not too difficult, either.

What was the purpose of this exercise? Now I can send a complex object to SQL in a single query, making inserts and updates simple and not requiring at a call for each instance of each type of complex object. Now, is it fast? I have no idea. Certainly if performance is needed, perhaps the UDT/DataTable approach is faster. However you will have to define a type for each type that you send as a DataTable to a stored procedure. An alternative can be a binary serializer and a CLR SQL function that translates it into tables. However, in my project I need to easily implement very custom API methods and to control every aspect, including tracing and profiling the various SQL calls. I believe the customized IXmlSerializable/XML in SQL approach is a reasonable one.

As always, I hope this helps someone.

1 comments:

Andrei Rînea said...

Very good points raised against UDTs... haven't realized all of them till now.