Tuesday, January 06, 2009

Serializing objects in .Net

This was something I have been meaning to write for quite some time, but actually, I've never had to work with a proper serialization scenario until recently. Here is goes:

First of all, whenever one wants to save an object to a string they google ".Net serializer" and quickly reach the XmlSerializer because that's what most people think serialization is. But actually, it is not. The whole point of serializing an object is that you can transfer and store it. Therefore you need to use a format that is as open, clear and standard as possible and to send only the relevant data, which in case of objects is the PUBLIC data. And for that, the XmlSerializer does its job, albeit, it does have some problems I am going to describe later.

But suppose you didn't really want to send mere data over to another computer, but an entire class, with its state intact, ready to do work as if the transfer never happened? Then you need to FORMAT the object. Enter the IFormatter interface with its most prominent implementation: BinaryFormatter. Funny enough, the methods used to spurt an object through a stream are also called Serialize and Deserialize. The advantages of the IFormatter way is that it is saving the entire object graph, private members included, and doesn't need all the requirements the XmlSerializer does. It also produces a smaller output. So, is this it? Why use Xml (which everybody secretly hates) when you can use the good ole obscure binary file with almost no trouble? Well, because of the almost. Yes, this way of doing things is not fullproof either.

Some people feel that the sending only the data is not serialization, and that the saving of the completele graph and internal state of the object is. Wikipedia says: "serialization is the process of converting an object into a sequence of bits so that it can be stored on a storage medium (such as a file, or a memory buffer) or transmitted across a network connection link. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object.". So, they bail by using the obscure phrasing of "semantically identical", which pretty much says "they mean the same thing even if their structures may differ". So, I think I am in the right, as the BinaryFormatter has real issues with structure change.

Now for the quick and dirty reference:
The XmlSerializer
  • Only serializes public READ/WRITE properties and fields - it doesn't throw any error when trying to serialize readonly properties, so be careful
  • Needs the class to have a parameterless constructor - this pretty much restricts the design of the classes you can serialize
  • Does not work on Dictionaries
  • Does use a Type definition to serialize and deserialize, which means you can still use it if the types are named differently or of different versions, even if they are radically different, as it will only fill the values that it stored and not care about the others
  • There are all sorts of attributes one can decorate their classes with to control serialization as well as some events that are fired during deserialization
  • Has issues with circular references
  • If your class implements IXmlSerializable it can control how the serialization is done

The BinaryFormatter
  • It serializes both public and private, readonly or read/write properties and fields as long as their type classes are marked as Serializable - that sucks for classes that are not yours
  • It's a rigid method of serializing objects - if you change the source or destination objects or even their namespace, the deserialization won't work

The SoapFormatter
  • Just when I thought that a class that combines the benefits of both BinaryFormatter and XmlSerializer exists, it appears it has been obsoleted in .Net 3.5. Besides, it did far less than the BinaryFormatter

It seems that Microsoft's idea of serialization blatantly differs from mine. I would have wanted a class that can serialize binary or Xml based on a simply property, send public OR both types of fields and properties, be flexible in how decorating attributes are used and what the output is. In my project I had to switch from BinaryFormatter, which seemed to solve all problems, to XmlSerializer (thus having to change a lot of the classes and design of the app) just because the type of the class sent by the client application could not have the same namespace as the one on the server.

That doesn't mean one cannot build their own class to do everything I mentioned above, of course. Here are some CodeProject examples:
A Deep XmlSerializer, Supporting Complex Classes, Enumerations, Structs, Collections, and Arrays
AltSerializer - An Alternate Binary Serializer.

Update: the .Net framework 3.5 has added an object called JavaScriptSerializer which turns an object into a single line of text, JSON style. It worked great for me in order to log hierarchical or collection based data. Use it just like the XmlSerializer.

Another link to check out is this one: Fast Serialization, but read the entire article before using any code.