Wednesday, November 23, 2016

The perils of giving your data objects methods

A colleague of mine hit a strange bug today. It so happened that we use a bastardized dependency injection method that takes into account the WCF session before returning an implementation of an interface. In a piece of code the injection failed and we couldn't see why for a while. Let me give you a simplified version:
var someManager=Package.Get<IManager>();
var someDTOs=Cache.GetDatabaseObjects().Select(x=>x.Pack());

public class DataObject {
    public string Data {get;set;}
    public DataObjectDTO Pack() {
        var anotherManager=Package.Get<IAnother>();
        return new DataObjectDTO {

Package.Get will attempt to find a session object and if not it will use another mechanism, but if it finds one, it will only use it if it is not expired or invalid, else throwing an exception. This code failed in the Pack method, when trying to get an instance of IAnother. Please take a few moments to reflect on why (and no, it's not that between calls the session expired).

Show explanation

Monday, November 21, 2016

The expression being assigned to '[something] must be constant when using integers inside strings

I've stumbled upon a very funny exception today. Basically I was creating a constant string from adding some other constant strings to each other. And it worked. The moment I added an integer, though, I got The expression being assigned to 'Program.x2' must be constant. The code that generated this error is simple:
const string x2 = "string" + 2;
Note that
const string x2 = "string" + "2";
is perfectly valid. Got the same result when using VS2010 and VS2015, so it's not a compiler bug, it's intended behavior.

So, what's going on? Well, my code transforms behind the scenes into
const string x2 = "string" + 2.ToString();
which is not constant because of ToString!

The only way to solve it was to declare the numeric constant as string as well.

Sunday, November 20, 2016

Getting random rows from a table in T-SQL: TABLESAMPLE [instructional post, but not a recommended method]

This clause is so obscure that I couldn't even find the Microsoft reference page for it for a few minutes, so no wonder I didn't know about it. Introduced in SQL Server 2005, the TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.

TABLESAMPLE (sample_number [ PERCENT | ROWS ] ) [ REPEATABLE (repeat_seed) ]

REPEATABLE is used to set the seed of the random number generator so one can get the same result if running the query again.

It sounds great at the beginning, until you start seeing the limitations:
  • it cannot be applied to derived tables, tables from linked servers, and tables derived from table-valued functions, rowset functions, or OPENXML
  • the number of rows returned is approximate. 10 ROWS doesn't necessarily return 10 records. In fact, the functionality underneath transforms 10 into a percentage, first
  • a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table.
  • it isn't even that random!

Funny enough, even the reference page recommends a different way of getting a random sample of rows from a table:
SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

Even if probably not really usable, at least I've learned something new about SQL.

More about getting random samples from a table here, where it explains why ORDER BY NEWID() is not the way to do it and gives hints of what really happens in the background when we invoke TABLESAMPLE.
Another interesting article on the subject, focused more on the statistical probability, can be found here, where it also shows how TABLESAMPLE's cluster sampling may fail in spectacular ways.

Monday, November 07, 2016

Am I a good person?

I am often left dumbfounded by the motivations other people are assigning to my actions. Most of the time it is caused by their self-centeredness, their assumption that whatever I do is somehow related more to them than to me. And it made me think: am I a good/bad person, or is it all a matter of perception from others?

I rarely feel like I do something out of the ordinary for other people; instead I do it because that's who I am. I help a colleague because I like to help or I refuse to do so because I feel that what I am doing is more important. Same with friends or romantic relationships. Sometimes I need to make an effort to do something, but it's still my choice, my assessment of the situation and my decision to go a certain way. It's not a value judgment on the person, it's not an asshole move or some out of my way effort to improve their life. What I do IS me.

It's also a weird direction of reasoning, since I am aware of the physical impossibility for "free will" and I subscribe to the school of thought that it is all an illusion. I mean, logic dictates that either the world works top-bottom, with some central power of will trickling down reality or it is merely a manifestation of low level forces and laws of physics that lead inexorably towards the reality we perceive. In other words, if you believe in free will, you have to believe in some sort of god, and I don't. Yet living my life as if I have no free will makes no sense either. I need to play the game if I am to play the game. It's kind of circular.

Getting back to my original question: Isn't good or bad just a label I (and other people) assign to a pattern of behavior that belongs to me? And not before I do things, but always afterwards. Just like the illusion of free will there is the illusion of moral quality that guides my path. While one cannot quantify free will, they can measure the effect my behavior has on their life and goals and determine a value. But then is my "goodness" something like an average? Because then it would be more important the number of people I am affecting, rather than the absolute value of the effect per person. Who cares I help a colleague or pay attention to my wife? In the big sea of people, I am just a small fish that affects a few other small fish. We could all die tomorrow in the belly of a whale, all that goodness pointless.

So here I am, asking essentially a "who am I" question - painfully aware it has no final answer - in a world I think is determined by tiny laws of physics that create the illusion of self and with a quantity of consequence that is irrelevant even if it weren't so. I am torturing myself for no good reason, ain't I?

Yet the essence of the question still intrigues me. Is it necessary that I feel a good drive for my actions to be a good person, or is it a posterior calculation of their effect that determines that? If I work really well and fast for a month and then I do less the next, is it that I did good work in the first month or that I am a lazy bastard in the second? If I pay attention to someone or make a nice gesture, is it something to be lauded, or something to be criticized when I don't do it all the time? Is this a statistical problem or an issue of causality?

And I have to ask this question because if I feel no particular drive to do something and just "am myself", I don't think people should assign all kind of stupid motivations to my actions. And if I need to make this sustained effort to go outside my routine just to gain moral value... well, it just feels like a lot of bother. And I have to ask it because the same reasoning can be applied to other people. Is my father making terrible efforts to take care of just about everybody in his life, making him some sort of saint, or is it just what he does and can't help himself, in which case he's just a regular dude?

Personally I feel that I am just an amalgamation of experiences that led to the way I behave. I am neither good nor evil and my actions define me more than my intentions. While there is some sort of consistency that can be statistically assessed, it is highly dependent on the environment and any inference would go down the drain the moment that environment changes. But then, how can I be a good person? And does it even matter?

Saturday, October 29, 2016

Controlling JSON serialization in .Net Core Web API (Serialize enum values as strings, not integers)

.Net Core Web API uses Newtonsoft's Json.NET to do JSON serialization and for other cases where you wanted to control Json.NET options you would do something like
JsonConvert.DefaultSettings = (() =>
    var settings = new JsonSerializerSettings();
    // do something with settings
    return settings;
, but in this case it doesn't work. The way to do it is to use the fluent interface method and hook yourself in the ConfigureServices(IServiceCollection services) method, after the call to .AddMvc(), like this:
    .AddJsonOptions(options =>
        var settings=options.SerializerSettings;
        // do something with settings

In my particular case I wanted to serialize enums as strings, not as integers. To do that, you need to use the StringEnumConverter class. For example if you wanted to serialize the Gender property of a person as a string you could have defined the entity like this:
public class Person
    public string Name { get; set; }
    public GenderEnum Gender { get; set; }

In order to do this globally, add the converter to the settings converter list:
    .AddJsonOptions(options =>
        options.SerializerSettings.Converters.Add(new StringEnumConverter {
            CamelCaseText = true

Note that in this case, I also instructed the converter to use camel case. The result of the serialization ends up as:
{"name":"James Carpenter","age":51,"gender":"male"}

Saturday, October 22, 2016

Beware LINQ OrderBy in performance sensitive cases

I was doing this silly HackerRank algorithm challenge and I got the solution correctly, but it would always time out on test 7. I wracked my brain on all sorts of different ideas but to no avail. I was ready to throw in the towel and check out other people solutions, only they were all in C++ and seemed pretty similar to my own. And then I've made a code change and the test passed. I had replaced LINQ's OrderBy with Array.Sort.

Intrigued, I started investigating. The idea was creating a sorted integer array from a space delimited string of values. I had used Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).OrderBy(v=>v); and it consumed above 7% of the total CPU of the test. Now I was using var arr=Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).ToArray(); Array.Sort(arr); and the CPU usage for that piece of the code was 1.5%. So it was almost five times slower. How do the two implementations differ?

Array.Sort should be simple: an in place quicksort, the best general solution for this sort (heh heh heh) of problem. How about Enumerable.OrderBy? It returns an OrderedEnumerable which internally uses a Buffer<T> to get all the values in a container, then uses an EnumerableSorter to ... quicksort the values. Hmm...

Let's get back to Array.Sort. It's not as straightforward as it seems. First of all it "tries" a SZSort. If it works, fine, return that. This is an external native code implementation of QuickSort on several native value types. (More on that here) Then it goes to a SorterObjectArray that chooses, based on framework target, to use either an IntrospectiveSort or a DepthLimitedQuickSort. Even the implementation of this DepthLimitedQuickSort is much, much more complex than the quicksort used by OrderBy. IntrospectiveSort seems to be the one preferred for the future and is also heavily optimized, but less complex and easier to understand, perhaps. It uses quicksort, heapsort and insertionsort together.

Now, before you go all "OrderBy sucks!", read more about it. This StackOverflow list of answers seems to indicate that in case of strings, at least, the performance is similar. A lot of other interesting things there, as well. OrderBy uses a "stable" QuickSort, meaning that two items that are compared as equal will appear in their original order. Array.Sort does not guarantee that.

Anyway, the performance difference in my particular case seems to come from the native code implementation of the sort for integers, rather than algorithmic improvements, although I don't have the time right now to grab the various implementations and test them properly. However, just from the way the code reads, I would bet the IntrospectiveSort will compare favorably to the simple Quicksort implementation used in OrderBy.

Friday, October 14, 2016

My first DMCA notice

Today I received two DMCA notices. One of them might have been true, but the second was for a file which started with
Copyright (c) 2010, Yahoo! Inc. All rights reserved.
Code licensed under the BSD License:
version: 2.8.1
Nice, huh?

The funny part is that these are files on my Google Drive, which are not used anywhere anymore and are accessible only by people with a direct link to them. Well, I removed the sharing on them, just in case. The DMCA is even more horrid than I thought. The links in it are general links towards a search engine for notices (not the link to the actual notice) and some legalese documents, the email it is coming from is and any hope that I might fight this is quashed with clear intention from the way the document is worded.

So remember: Google Drive is not yours, it's Google's. I wonder if I would have gotten the DMCA even if the file was not being shared. There is a high chance I would, since no one should be using the link directly.

Bleah, lawyers!