Monday, October 1, 2007

String References Part 2

In a previous post I wrote that two identical strings would always have the same reference. Now let's go the whole way in true MythBusters style and try to break the rule!

So how do we break the rule? Well I'll start by defining the scenarios I believe could be able to force .NET into creating two different references. When I have done that I'll test them one by one.

Overview of the Tests


  • The System.String.Clone() method.

  • Creating one of the String objects by concatenating two other String objects.

  • Creating one of the String objects through System.Text.StringBuilder.

  • Parsing through a String to a method using "ref" keyword.

  • Creating the two objects in two different assemblies.

  • Loading in 50MB texts into the two String objects.

  • The System.String( Char* ) constructor.

  • Binary Deserialization.

  • The System.Runtime.InteropServices.Marshal.PtrToStringUni( IntPtr ) method.


The System.String.Clone() method
This test is very simple. Just call the Clone() method and then compare the results. I'm pretty sure this will work, after all a Clone() implementation shouldn't return it's own instance.

Creating one of the String objects by concatenating two other String objects
This test is just as simple as the previous. I don't think this will actually change the reference.

Creating one of the String objects through System.Text.StringBuilder
Again we're dealing with a simple test. I'm not sure what I expect from this test, but if I have to choose I say 49.9% for different references and 51.1% for the opposite.

Parsing through a String to a method using "ref" keyword
Yet another simple test. The references should be the same, but you can always hope.

Creating the two objects in two different assemblies
This was a suggestion I got from a colleague. I can see the idea, but I have my doubts because as soon as I reference them, they will exist in the same AppDomain.

Loading in 50MB texts into the two String objects
The idea behind this test is that if .NET uses GetHashCode() to index the strings it might not be that quick when working with larger strings and therefore should have an upper limit of how long strings may be for the string indexing engine. Besides if it is using the GetHashCode() two nonidentical strings could actually end up returning the same hash code. Personally I think the references will still be the same in this test.

The System.String( Char* ) constructor
This is my personal favorite. I believe that this test will result in two different references. Because the characters is placed in the memory in the same way as the String object wants it to be, the String object can just use the pointer as reference.

Binary Deserialization
I admit it, I was searching for a couple more tests at this point. None the less it might actually work.

The System.Runtime.InteropServices.Marshal.PtrToStringUni( IntPtr ) method
Well this is the first time I'm actually using this method but from the name I guess that it will return the same reference.

This was a short description of all the tests. Before I'll let you see the results I think that you should have the chance of downloading the project to try it out for yourself. As usual it's a Visual Studio 2005 C# project targeted the Windows platform.


Download
To download the full project click here.


And now....








for the results!!


I have to admit that I'm very surprised to see that the String.Clone() method did not change the reference. By looking at the code for the method you will quickly see why:

  public object Clone()

  {

   return this;

  }


Microsoft probably implemented it like this with the "identical strings share the same reference" rule in mind. When looking at it in that way the implementation is valid.

The Conclusion
From this test I can safely state that it is possible to have two identical strings with different references.

0 comments: