Tuesday 27 October 2015

Deserialize Runtime Type

In .Net (I guess it's the same in Java), most times that we are going to deserialize an object, we know the concrete type of the object at compile time, so it's not a problem that a Generic constructor for the Serializer object or a Generic method for the Deserialize method is needed. But which are our chances if the specific type won't be known until runtime?

The scenario
Let's say that we have an Interface or a Base class to interact with, but we'll receive the JSON serialized form of one of its implementations or derived classes. For example:


 public abstract class AlgorithmBase
 {
  abstract public int DoCalculation();
 }

 public class SimpleAlgorithm: AlgorithmBase
 {
  public int Coeficient{get;set;}
  public int Var1{get;set;}
  
  public override int DoCalculation()
  {
   Console.WriteLine("SimpleAlgorithm.DoCalculation");
   return 1;
  }
  
 }

This makes perfect sense. Your code will be interacting with AlgorithmBase, but the instance to be deserialized is a SimpleAlgorithm, so on one side you need a way to set in the serialized JSON string the specific type, and on the other side invoke the deserialization code with that type known just at runtime.

It turns that this is pretty easy to achieve with Json.Net. First, let's see our JSON:

{
 "type": "DeserializeRuntimeType.Algorithms.SimpleAlgorithm",
 "data": {
  "coeficient": 5,
  "var1": 8
 }
}

So we need to extract from this JSON the type, and use it to parse the "data" part into an instance of that type. With Json.net we can parse our JSON string into a JSON object in memory (JObject). Furthermore this JObject implements (well, the JToken base class indeed) IDynamicMetaObject, so if used with the dynamic keyword we can conveniently access the different fields with the "." notation. So we can just do this:

dynamic obj = JObject.Parse(jsonString);
string typeStr = obj.type;
JObject jObj = obj.data;

Now, in order to deserialize those data in jObj into an instance of the type in typStr we can use 2 of the overloads of the JObject. The easy way is to use JObject.ToObject(Type, serializer) (I'm using the overload receiving a serializer because of the pascal/camel casing differences between csharp and standard JSON)

var serializer = new JsonSerializer()
   {
       ContractResolver = new CamelCasePropertyNamesContractResolver()
   };

//with a type known at compile time
//AlgorithmBase algorithm = jObj.ToObject(typeof(SimpleAlgorithm), serializer);
Type tp = Type.GetType(typeStr);

AlgorithmBase algorithm = jObj.ToObject(tp, serializer) as AlgorithmBase;

And the quite more complicated one is to invoke the Generic method JObject.ToObject<T>(serializer). Notice that the way to obtain the first MethodInfo when we have several overloads is quite a bit bizarre.

var serializer = new JsonSerializer()
   {
       ContractResolver = new CamelCasePropertyNamesContractResolver()
   };

Type tp = Type.GetType(typeStr);
//with a type known at compile time it would be just:
//AlgorithmBase algorithm = jObj.ToObject<SimpleAlgorithm>(serializer); 
MethodInfo method = typeof(JObject)
    .GetMethods()
    .FirstOrDefault(mt => mt.IsGenericMethod && mt.GetParameters().Count() == 1);
   
   MethodInfo generic = method.MakeGenericMethod(tp);
   
   AlgorithmBase algorithm = generic.Invoke(jObj, new Object[] {serializer}) as AlgorithmBase;

It's interesting to see how the overload that for a type known at compile time is more elegant: jObj.ToObject<SimpleAlgorithm>(serializer), for a type known at run time turns a bit more convoluted.

Monday 26 October 2015

SecureString

I used the SecureString class to store passwords in memory a few times in the past, and I have realised that I used it pretty wrongly, so I'll try to put up here what I have learned lately so that I can avoid such misuse in the future.

Remove from memory
Sometimes you have data (secret data) that want to keep in memory for as short as possible, just for the instant that they are going to be used. While those data are in memory someone could get access to them, either by inspecting the memory with a debugger, the swap file where it might have ended up or even easily, by dumping the process memory to disk (in modern Windows this is as easy as it can get, just launch the Task Manager, right click on the process and select "create dump file"). The first thought in order to get rid of those sensitive data would be to release that memory, but in the .Net managed world this is not so simple. After setting the reference to null you need the GC to come into action, and even when you can force a GC.Collect, this is not immediate, you also have to be aware about generations and so on. So a better option seems to overwrite the involved memory. For this, the .Net string class is a pretty bad bet, as it is immutable, so you can not overwrite it. You should better use a mutable structure (like a char array)

As a side note, I wanted to remark here that though you should not use strings for your critical data, it is not as horrible bad as I had got to think. I was thinking that because of string interning a string could stay in memory forever, but that is not the case. The string interning table is stored int he LOH (large object heap) that though GC'ed very unfrequently, it is occasionally. Furthermore, by default runtime strings (for example the string with a password that you have just unencrypted in memory) won't be interned, by default interning only applies to string literals.

SecureString
OK, so you'll read everywhere that you must use SecureStrings for all your sensitive data, but how does it really work? A SecureString contains an encrypted version of the string that you want to store in it. The encryption/decryption of the value is based on a symmetric algorithm and a key specific to the current logon session (this is based on DAPI key management and is explained here). The weak point obviously is the clean, sensitive string that will be encrypted into the SecureString. Some framework classes work directly with SecureStrings, for example WPF's PasswordBox, meaning that if a user types a password in that box, each character will be appended to a SecureString, and you will not end up with the password in a normal string at all. The ProcessInfo.Password is also a SecureString, but unfortunately there are many situations where the support for SecureStrings is missing.

A common sceneario
A relatively common scenario is having an application that needs to start another process or another thread (impersonation) under a different identity (UserB). We'll have the UserB password encrypted with some symmetric key in a file, DB ... we'll read it and decrypt it into a SecureString. This is important, some Symmetric Encryption classes that I've used for this in the past were just returning me a String, which is pretty wrong. You can find here a good implementation that returns SecureStrings. Notice how they do the decryption to a char[], build a SecureString from it, and immediately clear the Array with the clean password.

  public void Decrypt(byte[] input, out SecureString output, byte[] key,
            byte[] iv)
        {
            byte[] decryptedBuffer = null;

            try
            {
                // do our normal decryption of a byte array
                decryptedBuffer = Decrypt(input, key, iv);

                char[] outputBuffer = null;
                
                try
                {
                    // convert the decrypted array to an explicit
                    // character array that we can "flush" later
                    outputBuffer = _utf8.GetChars(decryptedBuffer);

                    // Create the result and copy the characters
                    output = new SecureString();
                    try
                    {
                        for (int i = 0; i < outputBuffer.Length; i++)
                            output.AppendChar(outputBuffer[i]);
                        return;
                    }
                    finally
                    {
                        output.MakeReadOnly();
                    }
                }
                finally
                {
                    if (outputBuffer != null)
                        Array.Clear(outputBuffer, 0, outputBuffer.Length);
                }
            }
            finally
            {
                if (decryptedBuffer != null)
                    Array.Clear(decryptedBuffer, 0, decryptedBuffer.Length);
            }
        }

OK, so now we have our password in a SecureString. This is fine for starting a new Process cause as previously mentioned ProcessStartInfo.Password is a SecureString, but what about impersonation?

As you know for impersonation we need to obtain an AccessToken via the LogonUser Win32 function. The problem is that this function does expect a clean password, not a SecureString, so are we condemned to the risks of having a clean password in our managed memory until the GC decides to clean it?

Hopefully there is a solution that is nicely explained here. The main point is that you can declare the PInvoke signature for LogonUser as receiving an IntPtr for the password parameter rather than a string (I had always done the latter). Then, you have to use Marshal.SecureStringToGlobalAllocUnicode to decrypt and marshal the SecureString into unmanaged memory. Once you are finished with LogonUser, you have to clean up the unmanaged memory holding the unencrypted password by calling into Marshal.ZeroFreeGlobalAllocUnicode

Sunday 25 October 2015

Thread Return Value

It's clear to me that since the introduction of the TPL in most cases there is no reason to create Threads directly via the Thread class, and we should just use Tasks via any of the overloads of Task.Run. Tasks work as an upper level abstraction over the low level Threads making use of the ThreadPool, and ever than possible we should use abstractions. Anyway, I still tend to occasionally create Threads directly, just from habit. The other day I realised of an interesting difference between both approaches that leads to writing quite a different code in both cases.

Mainly, Threads started via the Thread class do no return a result. You create your Thread, start it with the Start method and at some point the thread will end, but you are not provided with a generic place where you can get the return value from that code (that's why you can only pass to the Thread constructor a delegate that returns no value). Obviously you could work around this by setting that value in some global place (don't do that) , or in a property of some object passed as parameter to the Thread, or extend the Thread class adding a Result property...
Notice that another possibility would have been that Thread.Join returned a value (this is what happens in the horrific thread API provided by perl), but Thread.Join returns nothing.

Well, the idea of having a Result property in the Thread class is basically what we have with Task<.TResult>, as we get a Result property to hold the result of the code. Cool. This difference can have an interesting effect on how we write our consuming code, basically avoiding the use of locks. Let's see what I mean.

Let's say we have a class that downloads a post. It's an "old school" one working synchronously, a call to GetPost will just block until the operation is complete. We are going to download several posts in parallel, once all of them are finished we want to have these downloads in a dictionary of url, post content.

If we use classic Threads, we'll have to run in the thread the downloading code and the code that will add the result to a dictionary, and as several threads can be accessing this Dictionary at the same time, we have to synchronize the access by using a lock statement (that is, a Monitor)

  private static void ClassicThreads()
  {
   var lockHelper = new Object();
   var downloader = new PostDownloader();
   foreach (string url in urls)
   {
    //c# 5, the foreach var is internal to the loop, so each closure closes over a different variable...
    var th = new Thread(() =>
    {
              var txt = downloader.GetPost(url);
              Console.WriteLine(url + " downloaded");
              lock(lockHelper)
              {
               results.Add (url, txt);
              }
          });
    
    th.Start();
   }
   while(results.Count != urls.Count)
   {
    Thread.Sleep(200);
   }
   Console.WriteLine("All Done!");
   Console.WriteLine("- Results:\n" + resultsToString(results));
  }

If we use Tasks, we can run in the Task/thread only the downloading code, then wait for all the Tasks to be completed, and fill our dictionary with the results from the main thread, by reading the Task.Result property, no need for any synchronization.

  private static void TasksBased()
  {
   
   var downloader = new PostDownloader();
   var tasks = new List≶Task≶KeyValuePair≶string, string>>>();
   foreach (string url in urls)
   {
    Task≶KeyValuePair≶string, string>> downloadTask = Task.Run(() => {
      var res = downloader.GetPost(url);
      Console.WriteLine(url + " downloaded");
      return new KeyValuePair<string, string>(url, res);
    });
    
    tasks.Add(downloadTask);

   }
   Task.WaitAll(tasks.ToArray());
   
   foreach (var task in tasks)
   {
    results.Add(task.Result.Key, task.Result.Value);
    
   }
   
   Console.WriteLine("All Done!");
   Console.WriteLine("- Results:\n" + resultsToString(results));
  }

This article gives a nice overview of Threads, ThreadPool and Tasks.

Sunday 18 October 2015

.Net 4.6

While starting to try to put me up to day with .Net 4.6 I've come up with a few interesting things. Let's see.

I already wrote some months ago about how .Net 4.5 is an in place replacement for .Net 4. With 4.6 is right the same, and the same technique described in that post applies in order to verify which is the .Net version that you have installed. If I check on my PC the version of the clr.dll, I'll see that it is now: 4.6.96.0.

As you know Microsoft has been working for a while on a new and improved JIT (Ryujit). In the past it was possible (I never tried it) to install it (protojit.dll) side by side with the standard one(clrjit.dll), and select one or another. In 4.6 this new JIT has become the standard, and as such has been renamed to clrjit.dll. The old JIT is distributed with the framework in the compatjit.dll file, and if needed you can force an application to revert to it by setting a value in app.config. It's explained here.

.Net 4.6 comes with C# 6, and the long awaited Roslyn. However, the whole thing is quite a bit bizarre. The csc.exe compiler that you'll find in the Framework installation folder (C:\Windows\Microsoft.NET\Framework64\v4.0.30319) is the old compiler, not based on Roslyn and with support only for C# 5. You can verify this by just trying to compile some code taking advantage of some new C# 6 feature. For example, trying to compile a program including a line like this:

int? length = list?.Count;

will give you these errors:

error CS1525: Invalid expression term '.'
error CS1003: Syntax error, ':' expected

So the new Roslyn based, C# 6 compiler is not installed with the Framework!!!

Microsoft has gone crazy or what?
The new compiler is installed when you install Visual Studio 2015, and seems to get placed in: $ProgramFilesx86$\MSBuild\14.0\bin (it's explained here). This appears not to be a big problem for most developers, as you can install Visual Studio Community for free, but as it's a fully functional Visual Studio, the installation is slow and painful. I really can not understand how you can install SharvDevelop in 1 minute and less that 50 MBs while installing Visual Studio continues to take hours and GBs and GBs.

Hopefully, for those few of us that don't plan to install VS 2015 (on my personal laptop), we can install Roslyn on its own in 1 minute anyway. You just need nuget (that by the way, can now also be installed and used on its own, just download the binary from the nuget site), and run this command:
nuget install Microsoft.Net.Compilers
as explained here.
The new C#, roslyn based compiler is still named csc.exe, but its version is:
Microsoft (R) Visual C# Compiler version 1.0.0.50618
while the version of the the non Roslyn compiler is:
Microsoft (R) Visual C# Compiler version 4.6.0081.0!!!

Yes, pretty bizarre, thanks Microsoft for adding more confusion to our lives... I have installed the Roslyn compiler to a Roslyn folder and renamed the binary to csc6.exe, cause as I have the folders for both the old and the new compiler in my path environment variable, I needed a way to make sure I'm using the correct one.

Thursday 8 October 2015

Call the scum Daesh

For some time I had been wondering why almost all the French media tend to call the ISIS monstrosity by its Arab name, Daesh. It caused me some discomfort as to some extent it seemed to me that using the Arab name entailed some sort of recognition to them. I've just found that I was totally confused, you can read here that the Islamo-Fascists beast considers the Daesh term as an insult, because itsounds like the Arabic words Daes ("one who crushes something underfoot") and Dahes ("one who sows discord"). I've read somewhere that the French government sent a recommendation to the media to call this abomination "Daesh".

I pretty much like this stance, and I'll try to adhere to it myself. In the past I've found it quite revolting that many media (CNN for example) called them "militants". This is a neutral term (indeed I would tend to give it a positive undertone), and for the most part I don't believe in neutrality, indeed I hate neutrality, not taking sides is just a sign of cowardice and mercantilism. One can not be neutral before sadism and wickedness, one has to take sides and push others into also taking sides, not doing so turns you into an accomplice. You have to use a term that shows disgust, repulsion and contempt, you can not call Daesh "militants, you can not call Saudi Arabia or Qatar "states", you can not call Erdogan "president" and you can not call the Turkish repressive forces "an army", you have to call them "sadists", "terrorists", "vicious murderers" and "genocidals", because that's what they are. I hope someday history books will call them like that, and will call HEROES to all those that fight against them, like the Kurdish freedom fighters (YPG/YPJ, PKK) and the international volunteers.

Well, I think I'll leverage this post to put here a link to this inspiring video by the Kurdish left wing rapper Castro, and this other one. Would love to see those Kurdish troops enter one day in the Turkish presidential palace and burn it down to ashes. Long live to the Kurdish revolution! and Boycott Turkey, let's destroy their economy same as they torture, murder and rape innocent Kurds. Hopefully economic pressure could force progressive Turks and "normal" Turks to unite and revolt against Erdogan and the AKP scum, but well, honestly, with a population that votes massively for the Erdogan beast, I'm afraid there's little good to expect from the "average Turk", as the "average Turk" seems to be an Islamo-nationalist that denies other people's rights, denies the Armenian genocide, agrees with the invasion of Cyprus, and so on and on.

Sunday 4 October 2015

REC 4

[REC 4] has been a pretty nice surprise. I wanted to leverage the Cinespanha festival in Toulouse to go watch some film, but I didn't feel like spending much time going through the program, so when I found they were showing one more instalment of the [REC] franchise, the plan was done.

I had great memories from [REC 1] and [REC 2], but [REC 3] was crap (this is not just my feeling, it seems common to most critics), so much that I have to admit that I even hesitated about stepping back and save me the 7 euros of the ticket. Well, maybe [REC 3] would not feel so bad if you just looked at it as one more Zombies film, but if you put it along with its prequels, it's just crap. So, I was intrigued (and a bit scared) as to whether this 4th instalment would be in line with the good ones or with the shit one. Hopefully, it's been the former.

[REC 4] is a perfect continuation of the story developed in 1 and 2. It's as thrilling and intense as its predecessors. Though also set in a closed space (a ship rather than an old building), it does not achieve the same claustrophobic atmosphere, but it's just excellent anyway. I quite liked the turns given by the 3 films in explaining what is happening. While [REC 2] moved us from the infectious disease realm into the paranormal, demonic posession territory, [REC 4] moves us back into the infectious disease land.

The [REC] series (except for the third, crappy film, on which he was not involved) are the best known works by Spanish/Catalonian (I have no idea of which are his feelings of identity) director Jaume Balaguero, but his previous films: The Nameless, Darkness and Fragile are pretty good (Darkness in particular). I have realised that I had missed "Sleep Tight", so I'll have to put it in my ToDo list, aong with watching again [REC 1 and 2].