.NN #19: CorrelationManager

This .NET Nugget is going to focus on the CorrelationManager class from the System.Diagnostics namespace.  According to the MSDN docs this class “Correlates traces that are part of a logical transaction”.  While that’s technically accurate it isn’t very helpful.  However, if you completely understand its purpose from that statement you can skip the rest of this post.

Code from this Post can be found online at my MSDN Code Gallery for the .NET Nugget series.

I’m going to start with the Trace class.  I think most .NET developers are familiar with this class, but as a recap, this class is used to write messages from your code for instrumentation and tracing.  These messages can then be picked up by any registered TranceListener.  TraceListeners can then take the output and do any number of things with them such as writing them to a log file or database.  An instance of the DefaultTraceListener class is added to the TranceListenersCollection automatically unless you specifically remove it in configuration or programmatically.  This default listener takes the Trace messages and writes them to Win32 OutputDebugString function, which means that anything monitoring that OS API function can intercept the messages.  Applications such as DebugView from SysInternals/Microsoft and most debuggers watch for these messages.  When you run an application in the Visual Studio debugger you can see the Trace messages written out in the Output window because Visual Studio is monitoring the buffer OutputDebugString writes to.

NOTE: The TRACE compiler constant must be checked in your build options for trace code to even be compiled into your code.  Even for most Release code you’ll want to have the TRACE compiler constant on.  You can always use a TraceSwitch to control if your code actually outputs anything or not.

A quick example of using the Trace class can be found in the code below.  I’ve written this in VB.Net and it is a simple console application that calls a method which writes two trace messages.  Note that I’m using the new Trace.TraceInformation method instead of the old Trace.WriteLine

Module Main
    Sub Main()
        SomeClass.SomeOperation(10)
    End Sub
 
End Module
 
Public Class SomeClass
 
    Private Shared _rnd As New Random
 
    Public Shared Sub SomeOperation(ByVal SomeImportantID As Integer)
        Trace.TraceInformation("Starting some operation for ID: {0}", _
             SomeImportantID.ToString())
        Threading.Thread.Sleep(_rnd.Next(0, 2000))
        Trace.TraceInformation("Completed some operation")
    End Sub
End Class

The reason I put the semi-random sleep in the method will be evident in a minute.  If you run this code you’d see the following in the output window of Visual Studio:

DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting some operation for ID: 10
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Completed some operation

Not earth shattering, but if we were troubleshooting a problem in production and had some trace output in our code we could at least see what was going on.  Note that I included the ID in the trace output so we can see what the ID was of the item we were working with in the method.  Now, let’s try doing this with multiple threads.

    Sub Main()
        Dim worker1 As New Thread(AddressOf SomeClass.SomeOperation)
        Dim worker2 As New Thread(AddressOf SomeClass.SomeOperation)
        worker1.Start(10)
        worker2.Start(10)
    End Sub

All I did was change the Main thread to call the SomeOperation method on two threads.  What you see in the Output window now is (I’ve removed some of the other output to focus on what is important):

DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting some operation for ID: 10
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting some operation for ID: 10
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Completed some operation
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Completed some operation

Wow.  Well, not nearly as helpful.  We see that they both ran and in this case our addition of the ID to the trace output to let us know what we were working on was not helpful at all since the operation was concurrently executing for the same ID (depending on your system that may never happen, or it may happen all the time).  The other problem is that we don’t know which one finished first (thanks to the random execution time), so if we were troubleshooting we couldn’t be sure what was actually going on.  This is where the CorrelationManager comes in.

    Public Shared Sub SomeOperation(ByVal SomeImportantID As Integer)
        Trace.CorrelationManager.StartLogicalOperation("workerThread")
        Trace.CorrelationManager.ActivityId = Guid.NewGuid()
        Trace.TraceInformation("Starting some operation for ID:{0}, ActivityID:{1}", _
          SomeImportantID.ToString(), Trace.CorrelationManager.ActivityId.ToString())
        Threading.Thread.Sleep(_rnd.Next(0, 2000))
        Trace.TraceInformation("Completed some operation, ActivityID:{0}", _
          Trace.CorrelationManager.ActivityId.ToString())
        Trace.CorrelationManager.StopLogicalOperation()
    End Sub

Notice that now in the SomeOperation method I start a logical operation via the CorrelationManager.  I also set the ActivityID to a new GUID.  In my trace output I’m adding the ActivityID to the string to both messages.  Now when I run the code I get the following on the output (I formatted it a bit for readability):

DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting some operation for ID:10, ActivityID:08d3cfe2-0e9e-4fea-8db5-8956904639cf DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting some operation for ID:10, ActivityID:ae03abce-fbca-445d-8b0f-06a54ed87079 DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Completed some operation, ActivityID:ae03abce-fbca-445d-8b0f-06a54ed87079 DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Completed some operation, ActivityID:08d3cfe2-0e9e-4fea-8db5-8956904639cf

From this I can easily tell which trace messages go together.  Now, that’s helpful, but that sure was a lot of code, plus in all of our trace messages I had to put the ActivityID in the output manually.  Let’s work on that. 

First I’m going to abstract some of the code clutter out by providing a nice wrapper around the CorrelationManager usage.  We’ll call it a Tracer (Enterprise Library has a class called the Tracer which accomplishes the same task but has a lot more in it than what I’m going to include for this example.  The EntLib one does deserve much of the credit for where this simpler form came from).

Public Class Tracer
    Implements IDisposable
 
    Private disposed As Boolean = False
    Private timer As Stopwatch
 
    Public ReadOnly Property ActivityId() As Guid
        Get
            Return Trace.CorrelationManager.ActivityId
        End Get
    End Property
 
 
    Public Sub New(ByVal operationName As String)
        If Trace.CorrelationManager.ActivityId.Equals(Guid.Empty) Then
            Trace.CorrelationManager.ActivityId = Guid.NewGuid()
        End If
        InitializeTracer(operationName)
    End Sub
 
    Private Sub InitializeTracer(ByVal operation As String)
        timer = Stopwatch.StartNew()
        Trace.CorrelationManager.StartLogicalOperation(operation)
        Trace.WriteLine(String.Concat("Starting operation ", operation))
    End Sub
 
    ' IDisposable
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If disposing AndAlso Not Me.disposed Then
            ' Free other state (managed objects).
            timer.Stop()
            Try
                Trace.WriteLine(
                    String.Format("The operation ({0}) took {1} ms to execute.", _
                    Trace.CorrelationManager.LogicalOperationStack.Peek.ToString(), _
                    timer.ElapsedMilliseconds.ToString()))
            Finally
                Trace.CorrelationManager.StopLogicalOperation()
            End Try
        End If
        Me.disposed = True
    End Sub
 
    Public Sub Dispose() Implements IDisposable.Dispose
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub
 
    Protected Overrides Sub Finalize()
        Dispose(False)
        MyBase.Finalize()
    End Sub
End Class

And our worker method is simplified to:

Public Shared Sub SomeOperation(ByVal SomeImportantID As Integer)
    Using operationTracer As New Tracer("worker")
        Trace.TraceInformation("Working with ID: {0}", _
            SomeImportantID.ToString())
        Threading.Thread.Sleep(_rnd.Next(0, 2000))
    End Using
End Sub

Now our output looks like this:

DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting operation worker
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Working with ID: 10
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Starting operation worker
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : Working with ID: 10
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : The operation (worker) took 846 ms to execute.
DotNetNugget.CorrelationManager.vshost.exe Information: 0 : The operation (worker) took 965 ms to execute.

With a nice little wrapper we have now made our method code a lot simpler.  We just wrap what we want to trace in a Using statement.  The creation of the Tracer will handle the CorrelationManager work and will write out starting and ending messages including how long it took to execute!  But you may be wondering what happened to the ActivityID and why I didn’t include that.  Well, it is a pain to have to include that on all of your messages and luckily you don’t have to.  Visual Studio is using the textual message from the default tracelistener to populate the output window, but the Trace code actually captures a LOT more than just the message.  Let’s switch to a XmlWriterTraceListener and see what we get.  First I add the tracelistener via our app.config file.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <system.diagnostics>
        <sharedListeners>
            <add name="xmlListener"
                type="System.Diagnostics.XmlWriterTraceListener"
                initializeData="xmlOutput.xml"
                traceOutputOptions="LogicalOperationStack"  />
        </sharedListeners>
        <trace autoflush="true" indentsize="4" >
            <listeners>
                <add name="xmlListener"  />
                <remove name="Default" />
            </listeners>
        </trace>
    </system.diagnostics>
</configuration>

Notice that I added a “remove” for the Default tracelistener.  I didn’t have to do that, but by doing so you’ll no longer see the output in the Visual Studio Output window.  Based on this configuration all of the trace messages will be captured to a file named xmlOutput.xml in the bin directory.  In fact, when we run this we’ll get several entries in this file.  An example of one of these entries looks like this:

<E2ETraceEvent
    xmlns="http://schemas.microsoft.com/2004/06/E2ETraceEvent">
    <System
        xmlns="http://schemas.microsoft.com/2004/06/windows/eventlog/system">
        <EventID>0</EventID>
        <Type>3</Type>
        <SubType Name="Information">0</SubType>
        <Level>8</Level>
        <TimeCreated SystemTime="2008-07-05T22:19:22.7384000Z" />
        <Source Name="DotNetNugget.CorrelationManager.vshost.exe" />
        <Correlation ActivityID="{c63984a6-b694-4cea-ba52-dffacb5855a1}" />
        <Execution ProcessName="DotNetNugget.CorrelationManager.vshost"
            ProcessID="8492" ThreadID="11" />
        <Channel/>
        <Computer>HERMES</Computer>
    </System>
    <ApplicationData>
        The operation (worker) took 73 ms to execute.
        <System.Diagnostics
            xmlns="http://schemas.microsoft.com/2004/08/System.Diagnostics">
            <LogicalOperationStack>
                <LogicalOperation>worker</LogicalOperation>
            </LogicalOperationStack>
            <Timestamp>8596097149536</Timestamp>
        </System.Diagnostics>
    </ApplicationData>
</E2ETraceEvent>

That’s a lot of information!  This is the product of a single TraceInformation statement.  We get a timestamp of when it was created, the application that logged it, the machine it was logged from, what process and thread it came from and you can see the ActivityID is included.  If you looked at the full output you could map which messages went with what thread.  This is a lot of information and the other tracelisteners, or even a custom one, could help trim this down or do something interesting with it like store it in a database for troubleshooting. (Side Note: the ActivityID was not coming out with the standard Trace.WriteLine or Trace.Write methods, I had to use one of the new Trace.TraceXXXX methods to get it to appear).

The last thing I want to mention about the CorrelationManager is that it stores it’s information in Thread local storage.  This information is copied to child threads that are spun up as well.  If we change the the Main method a bit to contain it’s own Tracer (which effectively makes two nested Tracers as the Main method calls the other two) then we can see that the ActivityID is the same for all of the entries, but also check out the LogicalOperationStack elements.

Sub Main()
    Using mainOperationTracer As New Tracer("Main")
        Dim worker1 As New Thread(AddressOf SomeClass.SomeOperation)
        Dim worker2 As New Thread(AddressOf SomeClass.SomeOperation)
        worker1.Start(10)
        worker2.Start(10)
    End Using
End Sub
<E2ETraceEvent
    xmlns="http://schemas.microsoft.com/2004/06/E2ETraceEvent">
    <System
        xmlns="http://schemas.microsoft.com/2004/06/windows/eventlog/system">
        <EventID>0</EventID>
        <Type>3</Type>
        <SubType Name="Information">0</SubType>
        <Level>8</Level>
        <TimeCreated SystemTime="2008-07-05T22:38:29.1294000Z" />
        <Source Name="DotNetNugget.CorrelationManager.vshost.exe" />
        <Correlation ActivityID="{cbc023d6-3b94-4f25-9991-fc6ed778d0ee}" />
        <Execution ProcessName="DotNetNugget.CorrelationManager.vshost"
            ProcessID="2944" ThreadID="12" />
        <Channel/>
        <Computer>HERMES</Computer>
    </System>
    <ApplicationData>
        The operation (worker) took 523 ms to execute.<System.Diagnostics
            xmlns="http://schemas.microsoft.com/2004/08/System.Diagnostics">
            <LogicalOperationStack>
                <LogicalOperation>worker</LogicalOperation>
                <LogicalOperation>Main</LogicalOperation>
            </LogicalOperationStack>
            <Timestamp>8612512327233</Timestamp>
        </System.Diagnostics>
    </ApplicationData>
</E2ETraceEvent>

Note that the LogicalOperationStack now includes the worker as well as Main LogicalOperations.  It is a stack so the most recent operation is at the top.  This is really cool because this means we can start a Tracer at a top level operation and even with nested Tracers or operations below it everything gets tagged with the same ActivityID and we can clearly see the LogicalOperationStack of each. 

The idea of a correlating value for your logging operations is an excellent way to build in instrumentation and tracing so that you increase the maintainability of your code.  If you capture this type of output to a database or an easily query-able store then when problems arise you can flip on your TraceSwitch and then use the information to diagnose where the issue is. 

On a previous project  we were using Enterprise Library logging which has this concept of correlating ActivityID as well (actually using the same mechanism as what you see above).  We even carried that correlating value over to the database so that it was the last parameter passed to ALL stored procedures.  The database developers then also used this correlating value in their log messages.  This meant we could trace a logical operation or user session completely which can come in really handy.

I know this has been a long post, so hopefully you stuck it out.  I will finish up by saying that deciding how you are going to deal with logging in your application is often an overlooked aspect of projects, but it is very important.  Not only do you have to decide how you will log, but also what levels of logging you’ll deal with (Error, Warning, Information, etc.) and have guidelines for when to use each.

As usual, please let me know what you think.