Friday, May 02, 2014

Adventures in the .Net file system library System.IO

Intro (click to hide)
Sometimes, after making some software or another and feeling all proud that everything works, I get an annoying exception that the path of the files my program uses is too long, the horrible PathTooLongException. At first I thought it was a filesystem thing or an operating system thing, or even a .Net version thing, but it is not. The exception would appear in .Net 4.5 apps running on Windows 7 systems that used NTFS on modern hardware.

Lazy as any software dev, I googled for it and found a quick and dirty replacement for System.IO called Delimon that mitigated the issue. This library is trying to do most of what System.IO does, but using the so called extended-length paths, stuff that looks like \\?\C:\Windows, with external functions accessed directly from kernel32.dll. All good and nice, but the library is hardly perfect. It has bugs, it doesn't implement all of the System.IO functionality and feels wrong. Why would Microsoft, the great and mighty, allow such a ridiculous problem to persist in their core filesystem library?

And the answer is: because it is a core library. Probably they would have to make an enormous testing effort to change anything there. It is something from a managed code developer nightmare: unsafe access to system libraries, code that spans decades of work and a maze of internal fields, attributes, methods, properties that can be used only from inside the library itself. Not to mention all those people who decided to solve problems in core classes using reflection and stuff. My guess is that they probably want to replace file system usage with another API, like Windows.Storage, which, alas, are only used for Windows Phone.

In this blog post I will discuss the System.IO problems that relate to the total length of a path, what causes them and possible solutions (if any).

Let's start with the exception itself: PathTooLongException. Looking for usages of the exception in the mscorlib.dll assembly of .Net 4.0 we see some interesting things. First of all, there is a direct translation from Windows IO error code 206 to this exception, so that means that, in principle, there should be no managed code throwing this exception at all. The operating system should complain if there is a path length issue. But that is not the case in System.IO.

Most of the other usages of the exception come from the class PathHelper, a helper class used by the System.IO.Path class in a single method: NormalizePath. Wonderful method, that: internal static unsafe. PathHelper is like a multiple personality class, the active one being determined by the field useStackAlloc. If set to true, then it uses memory and speed optimized code, but assumes that the longest path will always be 260. That's a constant, it is not something read from the operating system. If set to false, the max path length is also provided as a parameter. Obviously, useStackAlloc is set to true in most situations. We will talk about NormalizePath a bit later.

The other usages of the PathTooLongException class come from two Directory classes: Directory and LongPathDirectory. If you instantly thought "Oh, God, I didn't know there was a LongPathDirectory class! We can use that and all problems disappear!", I have bad news for you. LongPathDirectory is an internal class. Other than that it seems to be a copy paste clone of Directory that uses Path.LongMaxPath instead of hardcoded constants (248 maximum directory name length, for example) or... Path.MaxPath. If you thought MaxPath and LongMaxPath are properties that can be set to fix long path problems, I have bad news for you: they are internal constants set to 260 and 32000, respectively. Who uses this LongPathDirectory class, though? The System.IO.IsolatedStorage namespace. We'll get back to this in a moment.

Back to Path.NormalizePath. It is a nightmare method that uses a lot of internal constants, system calls, convoluted code; it seems like someone deliberately tried to obfuscate its code. It's an internal method, of course, which makes no sense, as the functionality of path normalization would be useful in a lot of scenarios. Its first parameter is path, then fullCheck, which when true leads to extra character validation. The fourth parameter is expandShortPaths which calls the GetLongPathName function of kernel32.dll. The third parameter is more interesting, it specifies the maximum path length which is sent to PathHelper or makes local checks on the path length. But who uses this parameter?

And now we find a familiar pattern: there is a class (internal of course) called LongPath, which seems to be a clone of Path, only designed to work with long paths. Who uses LongPath? LongPathDirectory, LongPathFile and classes in the System.IO.IsolatedStorage namespace!

So, another idea becomes apparent. Can we use System.IO.IsolatedStorage to have a nice access to the file system? No we can't. For at least two reasons. First of all, the isolated storage paradigm is different from what we want to achieve, it doesn't access the raw file system, instead it isolates files in containers that are accessible to that machine, domain, application or user only. Second, even if we could get an "isolated" store that would represent the file system - which we can't, we would still have to contend with the string based way in which IsolatedStorage works. It is interesting to note that IsolatedStorage is pretty much deprecated by the Windows 8 Windows.Storage API, so forget about it. Yeah, so we have LongPathDirectory, LongPathFile and LongPath classes, but we can't really use them. Besides, what we want is something more akin to DirectoryInfo and FileInfo, which have no LongPath versions.

What can we do about it, then? One solution is to use Delimon. It has some bugs, but they can be avoided or fixed, either by the developers or by getting the source/decompiling the library and fixing the bugs yourself. A limited, but functional solution.
An interesting alternative is to use libraries the BCL team published for long path access: LongPath which seems to contain classes similar to the ones we find in mscorlib, but it's latest release is from 2010 or Zeta long paths which has a more recent release, 2013, but is completely different, using the FileInfo and DirectoryInfo paradigm, too.

Of course, you can always make your own API.

Another solution is to be aware of the places where the length limitation appears and avoid them via other type of development, in other words, a file system best practices compilation that eventually turns into a new file system API.

Both solutions coalesce into using some other library instead of System.IO. That's horrible and I think a stain on .Net's honor!

So let's see where the exception is actually thrown.

I've made some tests. First of all, I used FAR Manager, a file manager, to create folders of various lengths. The longest one was 255, before I got an exception. To my surprise, Windows Explorer could see it, but it could not open or copy/rename/delete it. I reduced the size of its name until the total size of the path was 260, then I could manipulate it in Windows Explorer. So there are external reasons for not creating paths as long, but we see that there are tools that can access files like that. Let's attempt to create some programatically.

System.IO.Directory.CreateDirectory immediately fires the exception. DirectoryInfo has no problem instantiating with the long path as the parameter, but the Create method throws the same exception. Any attempt to create a folder of more than 248 characters, even if the total path was less than 260 characters, failed as well.

However, with reflection to the rescue, I could create paths as long as 32000 characters and folders with names as long as 255 characters using our friend LongPathDirectory:
var longPathDirectoryType = typeof(System.IO.Directory).Assembly.GetTypes().First(t=>t.Name=="LongPathDirectory");
var createDirectoryMethodInfo = longPathDirectoryType.GetMethod("CreateDirectory", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
createDirectoryMethodInfo.Invoke(null, new object[] { path });

What about files? FAR Manager threw the same errors if I tried to create a filename larger than 255 characters. Let's try to create the same programmatically.

File.Create threw the exception, as well as FileInfo.Create and the FileStream constructors.

So can we use the same method and use LongPathFile? No! Because LongPathFile doesn't have the creating and opening functionalities of File. Instead, FileStream has a constructor that specifies useLongPath. It is internal, of course, and used only by IsolatedStorageFileStream!

Code to create a file:
var fileStreamConstructorInfo = typeof(System.IO.FileStream).GetConstructor(System.Reflection.BindingFlags.NonPublic|System.Reflection.BindingFlags.Instance,null,
        new Type[] { 
        typeof(string) /*path*/, typeof(System.IO.FileMode) /*mode*/, typeof(System.IO.FileAccess) /*access*/, 
        typeof(System.IO.FileShare) /*share*/, typeof(int) /*bufferSize*/, typeof(System.IO.FileOptions) /*options*/, 
        typeof(string) /*msgPath*/, typeof(bool) /*bFromProxy*/, typeof(bool) /*useLongPath*/, typeof(bool) /*checkHost*/},null);
    var stream = (System.IO.Stream)fileStreamConstructorInfo.Invoke(new object[] {
        path, System.IO.FileMode.Create, System.IO.FileAccess.Write,
        System.IO.FileShare.None, 4096, System.IO.FileOptions.None,
        System.IO.Path.GetFileName(path), false, true, false
Horrible, but it works. Again, no filenames bigger than 255 and the exception coming from the file system, as it should. Some info about the parameters: msgPath is the name of the file opened by the stream, if bFromProxy is true the stream doesn't try to demand some security permissions, checkHost... does nothing :) Probably someone wanted to add a piece of code there, but forgot about it.

Why did I use 4096 as the buffer size? Because that is the default value used by .Net when not specifying the value. Kind of low, right?

Now, this is some sort of midway alternative to using a third party file system library: you invoke through reflection code that is done by Microsoft and hidden for no good reason. I don't condone using it, unless you really need it. What I expect from the .Net framework is that it takes care of all this for me. It seems (as detailed in this blog post), that efforts are indeed made. A little late, I'd say, but still.


jogibear9988 said...

There is also: as alternative