Sunday 23 June 2013

Some Notes on IDisposable and Finalizers

Every now and then I still find myself scratching my head when adding a finalizer to a class or implementing IDisposable in a C# application, so I thought I would do a fast write up that sure could come handy the next time. By the way, I love JavaScript and firmly think one can write incredibly beautiful and complex code with it, but at the same time it keeps you so shut down from Resource Management or Concurrency that basic knowledge on these areas all of a sudden looks like "hardcore programming".

There are tons of resources around about this, but this is one of the most complete ones. From that and many other articles, we should know that the common pattern for a class implementing IDisposable and containing a Finalizer is something like this:

public class DisposableClass : IDisposable
{
    ~DisposableClass()
    {
        Dispose(false);
    }

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Clean up all managed resources
        }
            
        // Clean up all native resources
    }
}

So far so good. I'm not going to explain how the above works, or how the GC and Finalization work, it's explained in detail in thousands of places (indeed I also wrote about .Net's GC before [1] and [2], I'll just recap below a few points that seem useful to me:

  1. So, when should I care about implementing such pattern in my own classes?
    Basically, if your class directly owns an unmanaged resource (a Windows handle mainly) you'll need to write a finalizer that takes care of releasing such unmanaged resource.
  2. OK, and then, does having a Finalizer mean that my class has to implement IDisposable?
    In short, Yes. Well, when you write a Finalizer you make sure that the unmanaged resources will be eventually released by the GC-Finalizer maphia. As GCs take place in a non deterministic way you could play better with the whole system by allowing consumers (owners) of your objects to release these resources when they are done with your objects. For this, you should implement IDisposable, so that these other guys can directly call to you Dispose() method (or indirectly by means of the using clause). Notice that as reflected on the code above, such Dispose should invoke GC.SuppressFinalize, this way your object is removed from the finalization queue, and the GC-Finalization will work faster.
    This said, someone noted in StackOverflow that there are a few classes in the Framework (e.g. Threading.Thread and WeakReference) which have finalizers but do not implement IDisposable.
  3. Understood, but one more thing, does it make sense to implement IDisposable but not having a finalizer?
    Yes it does. If your class does not hold unmanaged resources, it won't have a finalizer, but it could hold instances of other IDisposable classes. In that case, you must implement IDisposable and invoke the Dispose() on those instances from your own Dispose(). Furthermore, your objects could want to run some other final actions when they are no longer needed (write to a log...) so that should also go in your Dispose, and well behaved consumers would remember to invoke it.

An important point to reckon is being careful in our finalizers not to invoke actions in other objects that could no longer be valid (cause their finalizers have run earlier than ours). I mentioned it here saying that you should refrain from calling a finalizable object from your finalizer. Well, indeed it's not just calling into finalizable objects, it's calling to any object implementing IDisposable what you have to avoid.
Let's say we have:

  • class A has a finalizer, is IDisposable and holds a reference to class B.
  • class B has no finalizer, but is IDisposable cause it has a reference to class C
  • class C has a finalizer and is IDisposable
If A's finalizer called to a method in B, it could be that such method would invoke some functionality in C. If C's finalizer had already run (which could be as finalizers are run in an undetermined order), we would have a problem there.

Before ending this post I'd like to mention how the special syntax (~MyFinalizer) defined by the C# team to facilitate the writing of Finalizers can add some confusion to people with some C++ background.

Annotation (Joe Duffy): Earlier in the .NET Framework’s lifetime, finalizers were consistently referred to as destructors by C# programmers. As we become smarter over time, we are trying to come to terms with the fact that the Dispose method is really more equivalent to a C++ destructor (deterministic), while the finalizer is something entirely separate (nondeterministic). The fact that C# borrowed the C++ destructor syntax (i.e. ~T()) surely had at least a little to do with the development of this misnomer. Confusing the two has been unhealthy in general for the platform, and as we move forward the clear distinction between resource and object lifetime needs to take firm root in each and every managed software engineer’s head.

Saturday 22 June 2013

Caller's Complete Name

Let's say we have a C# application where a method wants to know the full name (namespace, class name and method name) of the method that is invoking it (for saving some statistics for example). We have several ways to do this, though none of them is that elegant as I'd like.

  1. The called method takes care of this on its own, so the caller does not need to pass any additional parameter. Sounds like the ideal solution in terms of design, but it's quite of a performance killer, as we'll be using Stack traces for it.
    public void PlayRiff()
      {
       MethodBase methodBase = new StackTrace().GetFrame(1).GetMethod();  
       Console.WriteLine(methodBase.DeclaringType.FullName + "." + methodBase.Name);
      }
    
    this.Guitar.PlayRiff();
    
  2. C# 5 added the CallerMemberName attribute. It's very helpful for things like INotifyPropertyChanged. The thing is that it only provides the method (or property) name, but not the class where it resides, so it's not enough for this case. I guess a good addition for C# 6 would be something like CallerFullName
    //this would be the ideal solution... added to my C# 6 wishlist
      // public void PlayRiff4([CallerFullName] string fullCaller = null)
      // {
       // Console.WriteLine(fullCaller);
      // }
    
    //this.Guitar.PlayRiff4();
    
  3. Seems like we'll have to manually provide the information from the caller. Well, the simplest solution is just to pass string like this:
    public void PlayRiff3(string fullCaller)
      {
       Console.WriteLine(fullCaller);
      }
    
    this.Guitar.PlayRiff3("Test.Musician.Play");
    
    Such a hardcoded string is a pretty bad solution as we'll have to remember to update it each time we change the name of the namespaces, class or method, so this is like a recipe for failure.
  4. The clear winner for me is passing the type by means of this.GetType().FullName, and the method name with the aforementioned CallerMemberName.
    public void PlayRiff2(string typeStr, [CallerMemberName] string caller = null)
      {
       Console.WriteLine(typeStr + "." + caller);
      }
    
       this.Guitar.PlayRiff2(this.GetType().FullName);
       this.Guitar.PlayRiff2(typeof(Musician).FullName);
    
    Rather than GetType(), you could use typeof(MyClass). Future changes to the class name would not pose any problem, as any refactor tool would update the expression, but seems less natural to me (and I don't think there are any significant performance differences).

You can get the code here.

Tuesday 11 June 2013

Java vs .Net: String.equals, interning

As someone who reads and writes code in C# and JavaScript for most of his (programming) time, I still get puzzled each time I look at some Java code and find a String.equals(). Yes, I know the reason for this, anyone with a minimum Java background needs to know it, but it still seems odd to me, and even more odd to know that using a "==" could give different results depending on an implementation detail like string interning

So, first of all, a comparison can mean 2 different things: Identity/Reference equality and Value equality. This post puts it nicely:

  • Identity (reference equality)
    Two objects are identical if they actually are the same object in memory. That is, references to them point to the same memory address.
  • Equivalence (value equality)
    Two objects are equivalent if the value or values they contain are the same.

Either in C#, Java or JavaScript when we compare Value/Primitive types we expect an equivalence comparison, and when we compare Reference types (Objects) we mainly expect a reference comparison. However, some a few things come in the way to make matters more complicated: boxing (and caching), strings (and interning). Strings are Objects/Reference types (well, not in JavaScript where they are primitive types), but when comparing them we would usually prefer value semantics. I mean, you usually don't mind whether 2 strings are the same (the same bytes at the same memory address), but whether they have the same value. Java and C# follow different policies here.

C# sticks to the value semantics, and when comparing 2 strings with "==" it'll apply value equality. It does this by overloading the "==" operator for the String class. If it hadn't been overloaded "==" would do a Reference comparison (as it does for other objects). Operator overload is based on static methods, so it's resolved at compile time, which has really interesting implications, as sharply explained by Eric Lippert.

//C# code:
object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
Console.WriteLine(obj == str1); // true (Reference equality and both strings are interned)
Console.WriteLine(str1 == str2); // true (value equality)
Console.WriteLine(obj == str2); // false !? (Reference equality and the 2nd string is not interned)

Java does not feature operator overloading (I have mixed feelings for operator overloading, so I wouldn't necessarily say that it's a bad thing to have dismissed it), so it would not be easy to justify that "==" would behave for Strings differently from how it does for other objects, so str1 == str2 does a Reference comparison, and that's why you'll have to use String.equals, that does a value comparison. The odd thing, is that because of interning, sometimes == could seem to be doing a value comparison. Let's say we have:

//Java code:
String s1 = "hi"; //literal string, so it's interned
String s2 = "hi"; //literal string, so it's interned
s1 == s2; //true
String s3 = new String("hi"); //no interning
s2 == s3; //false

Being interned means that s1 and s2 are pointing to the same place in the string pool, so the Reference comparison will be true. However, s3 is not interned, so it's a different memory chunk, and the comparison will be false. This answer in Stackoverflow summarizes it pretty well:

== tests for reference equality.
.equals() tests for value equality.

Consequently, if you actually want to test whether two strings have the same value you should use .equals() (except in a few situations where you can guarantee that two strings with the same value will be represented by the same object eg: String interning).

On the contrary, in C# the fact of a string being interned or not won't have any effect on a "==" comparison. As it conducts value equality, it's the same whether the strings are really the same in the interning area, or different pieces of stack or heap memory. Well, C#'s "==" (that in the end calls to String.Equals()) will first do a reference check, so it can return true immediately if the strings are interned, sparing this way a longer char by char comparison. This is brilliantly explained here (I'd always thought of interning as a memory optimization, not as a processing optimization, and the point he brings up about using a string.intern() before a switch is really interesting)

It's also interesting to do some mention to boxing when dealing with equality. As expected, in C# (same as in JavaScript) when boxing 2 integers (Numbers) we have a reference comparison, so we get a false in these cases below:

I haven't done the test in Java cause I don't know any Java repl and I don't feel like going into the trouble of writing a Program.java for this... but this discussion in StackOverflow really bewildered me, cause due to caching the result for small numbers would be true:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Friday 7 June 2013

Cookies and localhost

I really hate it when I can't understand something, it hurts me and makes my mood harsher than usual. It's been a long while since I came to the realization that I'm not a particularly brilliant person, so I don't aspire to understanding Quantum Physics... but I really can't bear it when things that I consider to be within my intellectual realm decide to elude me. Today has been one of those days...

The thing was quite simple, I had a Web Application (well, sort of... the real thing is a bit more complicated) that was creating and sending a cookie to the client (sort of an AuthId that would allow us to identify the user as already authenticated). I could watch with Firebug or the Chrome Dev Tools how the server was sending a Set-Cookie Response Header, but the ensuing Requests would not send such cookie (in a Cookie request header). After carefully verifying that I was correctly setting the Domain, Expires and Path value of the cookie I thought it could be something due to the fact that the Response setting the cookie included a Redirection Status Code (307). Some googling didn't bring up any significant findings in that sense... so I ended up redesigning the thing to avoid the need for cookies. Anyway, the pain of not being able to make work an apparently so trivial solution kept shooting me inside (mainly because these things tend to lead me to think that same as I was wrong with this, I could be wrong with other related items). So, all of a sudden something sprang to mind, "could it be something with cookies and localhost!?"

And yes it was. To my astonishment I found this discussion stating:

by design domain names must have at least two dots otherwise browser will say they are invalid (see reference on http://curl.haxx.se/rfc/cookie_spec.html)

when working on localhost (!) the cookie-domain must be set to "" or NULL or FALSE instead of "localhost"

The solution is completely correct, setting Domain to null makes the whole thing work nicely and lets my self esteem return to its normal levels (not much high, but enough to get by...) but the thing leaves me wondering about the use of this odd "localhost" behavior.

Sunday 2 June 2013

Path vs Content in Method Parameter

Time for some irrelevant musing over some design details in 2 well known .Net libraries.

The other day, while reading the documentation of log4net, the signature of one of its methods rather caught my attention. I'm referring to the use of a FileInfo parameter in one of the XmlConfigurator.Configure overloads. I'm quite unaware of having seen before this technique in any other libraries, but at first sight it seemed quite interesting for how clearly it transmits the intent of the method. If you were passing a string, you would need to read the documentation to know whether that string is a path or a block of config values. With a FileInfo parameter it's crystal clear that the method expects a path. On the other side, as somebody says in this good StackOverflow discussion, probably most times you'll find yourself doing something like Configure(new FileInfo(path)), without leveraging any of the additional functionality of FileInfo, which seems like a waste of resources. On second thought, I think I would favor a design like this:
XmlConfigurator.ConfigureFromFile(string path);

This has made me think about how other popular libraries deal with similar issues. The first case that came to mind is .Net's BCL XmlDocument and its Load overloads. We find some Load methods expecting Readers or a Stream (notice that log4net's also had a Configure overload expecting a Stream), which intent is pretty clear, and then we find a Load method receiving a String, that the documentation tells is a path. Afterwards we find a LoadXml method expecting a string, and that the documentation explains as being a xml block.

Well, I would say this naming is confusing, and actually I can remember one case where this led me to write wrong code (by using the autocomplete without reading the doc). As all the other Load methods are receiving the xml data (either in a Stream or Reader), I think the Load(string) method should also receive xml data there (hence replacing LoadXml) and then, there should be a LoadFromFile(string) method.