Monday, November 25, 2013

Joining the rows of a table to the best row of another table in T-SQL

This is something I have been hitting my head on from the beginning of my programming career: just find the best match in a table for each row in another table through a single query.

There are solutions, but they are all very inefficient. To demonstrate the issue I will start with a simple structure: tables A and B, having the same columns id, x and y. I want to get, for each point in table A defined by the (x,y) coordinates, the closest point in table B. I only need one and it doesn't need to be exclusive (other points in A might be closest to the same point). It doesn't even have to be one row in B for each row in A, in case there are two points at the exact same distance to a point in A. The creation of the structure is done here:
CREATE TABLE A(id INT PRIMARY KEY IDENTITY(1,1), x FLOAT, y FLOAT)
INSERT INTO A (x,y) VALUES(10,20),(20,30),(20,10),(30,20),(30,20),(10,30)

CREATE TABLE B(id INT PRIMARY KEY IDENTITY(1,1), x FLOAT, y FLOAT)
INSERT INTO B (x,y) VALUES(11,20),(20,31),(21,10),(31,21),(30,20),(11,30)

To find the distance from A to the closest point in B is trivial:
SELECT a.id, 
       a.x, 
       a.y, 
       Min(( a.x - b.x ) * ( a.x - b.x ) + ( a.y - b.y ) * ( a.y - b.y )) AS 
       dist 
FROM   a 
       CROSS JOIN b 
GROUP  BY a.id, 
          a.x, 
          a.y 
To get the id of the closest B point, not so easy.

The first naive solution would be to just find the row in B that corresponds to each row in A using nested selects, like this:
SELECT * 
FROM   a 
  JOIN b 
    ON b.id = (SELECT TOP 1 b.id 
                    FROM   b 
                    ORDER  BY ( a.x - b.x ) * ( a.x - b.x ) + ( a.y - b.y ) *  ( a.y - b.y ) ASC)

Looking at the execution plan we see what is going on: 86% of the query is spent on "Top N Sort".
Let's get some other solutions so we can compare them in the end in the same execution plan.

Another solution is to just use the result of the query that computes the distance and just join again on the distance. That means we would compare each row in A with each row in B twice, once for the computation of the MIN function and the other for the join:
SELECT j.*, 
       b2.* 
FROM   (SELECT 
a.id, 
a.x, 
a.y, 
Min(( a.x - b.x ) * ( a.x - b.x ) + ( a.y - b.y ) * ( a.y - b.y )) AS m 
        FROM   a 
               CROSS JOIN b 
        GROUP  BY a.id, 
                  a.x, 
                  a.y) j 
       INNER JOIN b b2 
               ON j.m = ( j.x - b2.x ) * ( j.x - b2.x ) + ( j.y - b2.y ) * ( j.y - b2.y ) 

Something that does the same thing, essentially, but looks a little better is joining the table A with B and then again with B on having the point from B2 be closer to the one in B1, but then adding a condition that there is no B2 (in other words, B1 is closest):
SELECT a.*, 
       b1.* 
FROM   a 
       CROSS JOIN b b1 
       LEFT JOIN b b2 
              ON ( a.x - b1.x ) * ( a.x - b1.x ) + ( a.y - b1.y ) * 
                                                   ( a.y - b1.y ) > 
                 ( a.x - b2.x ) * ( a.x - b2.x ) + ( a.y - b2.y ) * 
                                                   ( a.y - b2.y ) 
WHERE  b2.id IS NULL 

None of these solutions scan B only once for each row in A. Their relative complexity is this: 75%, 11% and 14%, respectively. In other words, finding the minimum distance and then joining with the B table again on the points that are in exactly that distance is the best solution. However, given some assumptions and a weird structure, we can get to something that runs in half that time:
SELECT id      AS Aid, 
       x, 
       y, 
       m % 100 AS bId 
FROM   (SELECT a.id, 
               a.x, 
               a.y, 
               Min(Cast( ( ( a.x - b.x ) * ( a.x - b.x ) + ( a.y - b.y ) * ( a.y - b.y ) ) AS BIGINT) * 100 + b.id) AS m
        FROM   a 
               CROSS JOIN b 
        GROUP  BY a.id, 
                  a.x, 
                  a.y) j 

These are the assumptions that must be true in order for this to work:
  • The function value can be converted to a BIGINT without problems. (if the distance between points would have been subunitary, this would have lost precision)
  • The maximum ID in table B is under a certain value (in this case 100)
  • The converted function multiplied by this maximum number doesn't cause an overflow
Basically I am mathematically creating a container for the value of the function and the id of the point in B, computing the minimum, then extracting the id back from the value. Neat.

Another solution, one that makes most apparent sense, is using a feature that was introduced in SQL Server 2005: RANK. We rank the points in B to each point in A, based on our function, then we only get the first by selecting on the rank being 1. Unfortunately, this doesn't work as expected. First of all, you cannot use RANK in the WHERE clause, so you must select the rank first, then select from that selection to add the condition. This might mean horrid temporary data tables if tables A and B are huge. Also, after running the query, it appears it is slower than the one that joins on the minimum distance. Here it is:
SELECT aid, 
       bid 
FROM   (SELECT a.id                                                      AS aId, 
               a.x, 
               a.y, 
               b.id                                                      AS bId, 
               Rank() 
                 OVER( 
                   partition BY a.id 
                   ORDER BY (a.x-b.x)*(a.x-b.x)+(a.y-b.y)*(a.y-b.y) ASC) AS rnk 
        FROM   a 
               CROSS JOIN b) x 
WHERE  rnk = 1 

Comparing all the solutions so far, without the first naive one, with the nested selects, we get these values:
  1. Mathematical container of function value and id: 14%
  2. Selection of the minimum distance to each point and then joining with table B for the second time to look for the point that is at that distance: 21%
  3. Joining twice on the same table with the condition that one is better than the other and that the better one doesn't actually exist: 29%
  4. Using RANK: 36%, most surprisingly the worst solution

The final solution, adding some more computation in order to get rid of constants and some assumptions thus becomes:
DECLARE @MaxId BIGINT 

SELECT @MaxId = Isnull(Max(id) + 1, 1) 
FROM   B;

WITH q AS (SELECT A.id, 
               A.x, 
               A.y, 
               Min(Cast(Power(A.x-B.x, 2) + Power(A.y-B.y, 2) AS BIGINT) * @MaxId + B.id) AS m 
        FROM   A 
               CROSS JOIN B 
        GROUP  BY A.id, 
                  A.x, 
                  A.y)
SELECT id         AS aId, 
       x, 
       y, 
       m % @MaxId AS bId 
FROM   q; 


I am still looking and there is now a question on StackOverflow that attempts to get the answer from the community, so far with limited results.

Friday, November 22, 2013

Prevent a Windows computer from going idle even when the Group Policy prevents you from installing any software

I work in this silly place where everything must be done according to some plan or procedure. They aren't even very good at it, but they are very proud of this bureaucracy. For example I don't have Outlook installed on my work machine, but on a virtual one which is in a different network and can be accessed only by remote desktop protocol. Some admin with a God complex thought it was a good idea to make the computer lock itself after a few minutes of idleness and even close the entire virtual machine when no one accesses it for a while. This might have some sick sense in the admin's head, but I need to know when an email arrives and so I would like to have this virtual machine open on the second monitor without having to enter the password every 5 minutes. To add hurt to offence, I cannot install any software on the virtual machine or using Powershell to prevent the computer going idle or anything useful like that. Good, a challenge! I need to find a way to keep the remote desktop session alive.

Enter Windows Script Hosting. I've created a small Javascript file that gets executed by the machine and occasionally moves the mouse and simulates pressing Shift. No more idleness and no need to access Group Policy or install anything. Just create a text file and paste the following code and then save it with a .js extension, then run it. It will keep the computer from going idle.
var WshShell = WScript.CreateObject("WScript.Shell");
for (var i=0; i<60; i++) // 60 minutes
{
WshShell.SendKeys('+');
WScript.Sleep (60000);
}

Step by step instructions for non technical people:
  1. Press the Windows key and E to start the Windows Explorer
  2. In the Explorer, navigate to Desktop
  3. Remove the setting for "Hide extensions for known file types" - this is done differently from Windows version to Windows version, so google it
  4. Create a new text file on the desktop by right clicking in it and choosing "New Text Document"
  5. Paste the code above in it
  6. Save the file (if you successfully removed the setting at point 3, you should not only see the name, but also the .txt extension for the file)
  7. Rename the file to busybee.js (or any name, as long as it ends with .js
  8. Double click it

The script will run 60 times at every minute (so for an hour) and keep the machine on which it runs on from going idle. Enjoy!

Displaying a message from a T-SQL query in real time

The preferred method to display anything in Transact-SQL is PRINT. You can print a string, a variable, an expression. However, as anyone soon finds out, the message get all cached in a buffer and displayed after the entire query ends. So if you have several long running queries in a single batch and you want to get real time messages from them, PRINT doesn't work. A quick search directs you to another MS SQL directive: RAISERROR (note the creative spelling that makes one think more of hearing Katy Perry RROR rather than a proper error raising directive). Also note that Microsoft recommends using a new construct called THROW, introduced in SQL2012. However, it only looks like a lamer version of RAISERROR. They both send a message to the client instantly, but the problem they have is that they do not, as PRINT does, accept an expression. So if you want to print something like 'The computed key from the query is: '+@Key you are out of luck as you need to declare a new nvarchar variable, fill it with the message value then use it in RAISERROR.

But there is a better solution. RAISERROR not only throws something at the client, it also flushes the message cache. So something like this works: PRINT 'The computed key from the query is: '+@Key; RAISERROR('',0,1) WITH NOWAIT;.

Of course, being the dev that I am, I wanted to encapsulate this into something that would be reusable and also get rid of the need do use plus signs and conversion to NVARCHAR, so I created this procedure that almost works like PRINT should have:
CREATE PROCEDURE Write (@P1  NVARCHAR(max)=NULL, 
                        @P2  NVARCHAR(max)=NULL, 
                        @P3  NVARCHAR(max)=NULL, 
                        @P4  NVARCHAR(max)=NULL, 
                        @P5  NVARCHAR(max)=NULL, 
                        @P6  NVARCHAR(max)=NULL, 
                        @P7  NVARCHAR(max)=NULL, 
                        @P8  NVARCHAR(max)=NULL, 
                        @P9  NVARCHAR(max)=NULL, 
                        @P10 NVARCHAR(max)=NULL) 
AS 
    PRINT Isnull(@P1, '') + Isnull(@P2, '') 
          + Isnull(@P3, '') + Isnull(@P4, '') 
          + Isnull(@P5, '') + Isnull(@P6, '') 
          + Isnull(@P7, '') + Isnull(@P8, '') 
          + Isnull(@P9, '') + Isnull(@P10, '') 

    RAISERROR('',0,1) 

And you use it as DECLARE @now DATETIME = GetDate()
Write 'The date today is ',@now
. Nice, huh? Of course what you would have liked to do is Write 'The date today is ',GetDate(), but apparently stored procedures do not accept functions as parameters, and functions do not accept PRINT inside them.

Sunday, November 17, 2013

Are all love stories fantasies?

I remember when I was a kid in highschool and wanted so much to experience that amazing feeling books and movies described, the magical bond between man and woman, the thing that makes everything worthwhile and even death irrelevant. So I had the feeling. It was amazing, it was magical, it was everything I expected it to be and more. And yet it all turned out to be an illusion, a manufactured fantasy of my own making that just tumbled down like a castle made of cards. And I always thought that was sort of weird and uncommon, but now I am thinking that maybe all romance is like that.

Could it be that love is something we are programmed to need to feel and we are just doing all is necessary in order to have the feeling into a familiar and comforting setting? All other sensations and feelings and needs are accounted for: you are happy because something good happened, you are hungry because you haven't eaten, you are disgusted because someone did something, you want to belong because we are a social species, but love is something that doesn't make a lot of sense, it's just there. It also feels too personal to assign it under "a species thing", even if its ultimate purpose is just that. So, in order to justify it, you need an origin story, just like any good comic book hero.

Thus the myth is born, perpetuated, kept alive as long as you have the feeling. And when you break a relationship up, again there are these justifications, stuff that just appears out of thin air and one or both of the people involved are always shocked, because they weren't there five minutes ago, when people were in love. "You killed my parents!" "You threw me in acid!" and so on. I don't want to stretch the superhero thing too much, though. Alternatively, think back and try to imagine all of your relationships as movies in the Kick Ass franchise. :)

So this is how I am seeing this: you meet this other person, you develop feelings of affection (or maybe just carnal lust, I don't care) and your brain starts churning up this script in the background. Once the script is ready, the focus group enjoyed it, the producers secured the financing, you already have the cast and bang! you have a blockbuster movie you call love. Until then it was just... less, "simple" affection or enjoyment, but with the origin story it becomes the superfeeling, capable of incredible feats, worthy of songs and tales.

As an argument for this vision I bring the terrible reluctance of people to dispel the magical script of their love affair. When the mystery of their feelings is laid bare they feel they have lost something. When outsiders are trying to look at it logically or to deconstruct the story they meet with fanatical resistance. After all, if you are an artist, making your own movie and starring in it, you really hate critics.

And when you are in a long relationship, one that spans more than the fabled three years, you continue to have the feelings, now intertwined with practical issues like who gets the car, who walks the dog, but they don't look so great because the love movie, as cool as it was, is old. You remember those times, maybe share them with your friends, just like remembering a good movie (made even better in your head by the bad parts you chose to conveniently forget) and your friends' reactions are just like for a movie, either sharing their own or criticizing yours or agreeing it was cool, maybe even getting the old "they don't make them like they used to" speech.

Of course, in order to definitively prove such a thesis, one must delve into the idea of what is real and what is not. That's a can of worms I am not opening. And believing every love story is a fantasy that covers up more mundane things and in which everyone involved participates willingly doesn't diminish its power. You can enjoy a movie even when you know it's not real, especially because you know it's not real. I also have to say something about the nature of control in a love situation. It's not clear to me, yet, but the gist of it is that this is why people do enjoy playing romantic games of seduction and betrayal, but ultimately dismiss the practice as not serious. It's the same kind of behaviour one sees in parents that scoff at their children's games, but watch TV all day. The thing is in a story you don't have control. It becomes more and more powerful the more the characters can say it all happened to them, rather than being something they planned. Somehow having control in a romantic relationship is frowned upon and having a good story on why losing control was a good thing is a form of art.

I'll end with a fragment of a movie I really liked, one that you probably don't associate with romance: The Name of the Rose. There was this scene where a girl, dirty and inarticulate, scavenging in the castle's garbage, meets our young protagonist interpreted by Christian Slater and just humps him like there is no tomorrow, groaning and moaning like a wild animal for the whole five minutes; then she leaves. Later on, when people are trying to burn her as a witch, the boy saves her. At the time I found that very romantic, even if they didn't even share a word. Maybe it's the fact that he gets to save her in the end, but most likely it's Slater's performance of looking at the girl with a longing and very confused gaze, the "what the hell is happening to me?" look that betrays the background script writing in the back of his skull.

Friday, November 08, 2013

The magical world of advertising

There is nothing more fantastic than the advertising world. Whether it is a brilliant spot, winning prize after prize at ad competitions, or a formulaic and horribly boring video of a housewife getting aroused by the newest household products, marketing and advertising items have one thing in common: they can't possibly be true. Maniacal teens getting their faces stretched by hideously wide grins cannot possibly love a soft drink or a gum so much. The housewife I previously mentioned probably has a job and children to take care of on the side, chances are there is little that actually makes her happy and absolutely nothing that can make her ecstatic about cleaning products. The newest app is never as good as the banner annoyingly appearing over your web site indicates, nor is the latest enterprise software doing the job it's supposed to do beyond a superficial number of typical scenarios. No teen finds happiness by buying the latest smartphone or opting in the latest two year cell subscription.

The governments everywhere are not doing their job, not keeping the promises they made during electoral campaigns, companies try to hide their problems under the rug while bragging to anyone that listens of the quality of their staff and products, human resources hire people under false pretenses while employees advertise skills that they never had. Boys proffer their love to girls and they in turn promise to have actual sex after the relationship steadies. Security companies, national or commercial, are only interested in their own security rather than yours. People that sarcastically make fun of other people's commercials are never as cool as they attempt to look.

I don't know about you, but in the rare cases when software doesn't manage to remove ads completely from my browser or smart phone, the fact that the ads appear in front of me has no effect other than mild (or strong) annoyance. My mind filters out the flashing images, the over the top colors and embellished fonts, anyway. Even the user content and comments that have been touted as the soul of Web 2.0 (or is it 3.0 now?) which would allow me to feel connected and get the information I need, they either reek of aggressive stupidity or bring little useful information to the table. I don't care what the top N items are in any list, nor am I interested in what other people Liked or not on Facebook. I never give money to beggars and don't buy stuff from loud people.

As far I can see, there is nothing that can be called advertising that is actually true or at least beneficial. So I have to wonder, in view of this impression of mine, could it be possible that the promise of advertising as a whole is just a lie as well, that the advertising of the marketing science and art is just as much bullshit as anything else the comes out of their world? Could it be that advertising's greatest feat is to make us think it actually works?

Monday, November 04, 2013

We, by Yevgeny Zamyatin (Eugene Zamiatin)

Book cover We is a classic book, written by Russian author Yevgeny Zamyatin, who envisioned how the future in the hands of the Bolsheviks will work against the individual. He imagined a world in which all people shave their head, wear the same uniform and have numbers instead of names and live in completely transparent buildings, with no privacy. As the preface of the book there are three commentaries on Zamiatin's genius, history as a writer in Russia, then his voluntary exile and the interdiction of his works in Russia.

So did I like the book? No. To say that is dated is an understatement, but it's not only that. The style is that of a diary, but one in which dialogues between people are described verbatim. The way they speak is emotional, hysterical, they interrupt each other and words are incoherent, so contradictory to the premise of the book. Even the inner thoughts of the main character are chaotic, childish, delirious, with wild mood swings. It was so horrible that, after deciding that 200 PDF pages were not going to be a lot of reading, I stopped at 50 and started skipping ahead. The entire book is like that.

Maybe Zamiatin was a great writer, but I don't see it. Just because he was Russian is no excuse for his incoherence. It felt like he wrote the book while intoxicated; with too much coffee. Even if he starts his book with Dear readers, you must think, at least a little. It helps, I believe that it was his duty as a writer to communicate his ideas in a digestible form. Maybe thinking is different from person to person, Zamiatin. If I think about it, the story within is similar to the movie Equilibrium, only that was more action based, as any American movie is prone to be, and watchable. The same champion of the system being seduced (through a woman, how else?) by the emotions that he was sworn to fight against.

About the subject of the book, I can say that stories about rebels fighting Utopian regimes are making me feel conflicted. I care about the personal desires of the individual, but I also care about the overall system. One must understand that a centralized and overall controlled system is as disturbing to a person that experienced individual freedom as is a decentralized system based on individuals is for someone that lived in a different environment. To assume that one or the other knows better is just arrogant and stupid. The worse thing that can happen, in my view, is to only have one acceptable system that, even if we could leave one country for another, would be the same all over the world.

On the other hand, once you, as an individual, decide that something is right and something is wrong, you have a responsibility to act. I just remember that quote Slaves dream not of freedom, but of becoming masters. It would be a lot easier for me to accept people rebelling against systems if they would stop attempting to change the world for everybody, not just themselves.

Well, I will leave you with a quote from the book, one that seemed eerily contemporary: If human liberty is equal to zero, man does not commit any crime.