This is the first part of the Logging in distributed system series. For your convenience you can find other parts using the links below (or by guessing the address):
Part 1 — Correlations
Part 2 — Action filters
Part 3 — Exceptions
Part 4 — Passing data to other systems
Part 5 — Parsing logs

Distributed systems are hard. Patterns of Enterprise Architecture says that “You shouldn’t try to break up a single application into Web services that talk to each other unless you really need to.” However, nowadays microservices (and even nanoservices) are very popular. One issue when working with such an architecture goes down to logging — there are things which simply must be reliable because without them debugging errors is close to impossible.

General idea

Generally it is a good idea to have logs from different components gathered in one place. That’s why there are applications like ELK stack (Elasticserach, Logstash and Kibana) which download logs, transform them to common format and display. Search capabilities and built-in alerts are also nice features.

In order to analyse logs, you need to be able to correlate them. That’s why correlation id is often a “must-have” pattern. In this series I am going to show you a simple implementation of logger handling correlation ids and Lamport’s happened-before relation for ordering events. Do not consider this a one solution to rule them all, this is just a base implementation ready to be extended as you need.

Components

Let’s assume that our system consists of Sitefinity CMS, Azure Webjob and REST API based on WebAPI 2. All these components are hosted in Azure. We want to design a code for logging events in an uniform way.

Interfaces

First, let’s start with interfaces:

using DomainCommons.Correlations;

namespace DomainCommons.Loggers
{
	public interface ILogger
	{
		void Log(LogLevel level, string message);
		ICorrelator Correlator { get; }
	}
}

We have an interface for our logger. It gives us only one method to log event, accepting information level (see below) and string message. Ideally, logger implementation should add necessary infrastructure details to the message (node id, correlation id, timestamp etc).

namespace DomainCommons.Loggers
{
	public enum LogLevel
	{
		Debug = 1,
		Information = 2,
		Warning = 3,
		Error = 4
	}
}
namespace DomainCommons.Correlations
{
	public interface ICorrelator
	{
		string GetCorrelationId();
		int GetLogicalTime();
		void UpdateLogicalTime(int newTime);
		string Activity { get; set; }
	}
}

Since we need to maintain logical time (Lamport’s happened-before) and correlation ID, we have an interface to store these data. Also, we store activity which represents currently executed action.

Also, we might use a bunch of constants to store data:

namespace DomainCommons.Loggers
{
	public static class Constants
	{
		public static readonly string CorrelationIdHeader = "X-Correlation-ID";
		public static readonly string CorrelationCounterHeader = "X-Correlation-Counter";
		public static readonly string CorrelationIdItem = "CorrelationId";
		public static readonly string LoggerItem = "Logger";
	}
}

This is it. Now we need to implement interfaces. Let’s start with correlators.

Correlators

We basically have two entry points: one for web application and one for web job. Web application should get the correlation ID from request headers or create new one. Web job always needs to create the ID on its own. Let’s start with base correlator:

using System;
using System.Threading;

namespace DomainCommons.Correlations
{
	public abstract class Correlator : ICorrelator
	{
		private int _logicalTime;

		public abstract string GetCorrelationId();
		public abstract string Activity { get; set; }

		public int GetLogicalTime()
		{
			return Interlocked.Increment(ref _logicalTime);
		}

		public void UpdateLogicalTime(int newTime)
		{
			int currentTime, finalTime;

			do
			{
				currentTime = _logicalTime;
				finalTime = Math.Max(currentTime, newTime);
			} while (Interlocked.CompareExchange(ref _logicalTime, finalTime, currentTime) != currentTime);
		}

	}
}

The only logic here is for maintaining logical time. Since we don’t want to use locks to decrease performance, we use interlocked operation to update logical time.

Let’s implement correlator for web applications:

using System;
using System.Web;
using DomainCommons.Correlations;
using DomainCommons.Loggers;

namespace AzureCommons.Correlations
{
	public class RequestHeadersCorrelator : Correlator
	{
		private readonly Lazy _correlationId;

		public RequestHeadersCorrelator()
		{
			_correlationId = new Lazy(GenerateCorrelationId);

			var logicalTime = HttpContext.Current?.Request.Headers[Constants.CorrelationCounterHeader];
			if (logicalTime == null)
			{
				return;
			}

			UpdateLogicalTime(int.Parse(logicalTime));
		}

		public override string GetCorrelationId()
		{
			return _correlationId.Value;
		}

		public override string Activity { get; set; }

		private string GenerateCorrelationId()
		{
			var id = (string)HttpContext.Current?.Items[Constants.CorrelationIdItem] ??
				   HttpContext.Current?.Request.Headers[Constants.CorrelationIdHeader] ??
				   Guid.NewGuid().ToString();

			if (HttpContext.Current != null)
			{
				HttpContext.Current.Items[Constants.CorrelationIdItem] = id;
			}

			return id;
		}
	}
}

Implementation is rather straightforward. Correlation ID and logical time are extracted from headers or generated if they are absent. Activity is not set yet, it will be done later.

We also need to implement correlator for web jobs. This is pretty easy since it doesn’t need to bother with headers and always generates data on start:

using System;
using DomainCommons.Correlations;

namespace AzureCommons.Correlations
{
	public class MemoryBasedCorrelator : Correlator
	{
		private readonly string _correlationId = Guid.NewGuid().ToString();

		public override string GetCorrelationId()
		{
			return _correlationId;
		}
		public override string Activity { get; set; }
	}
}

Loggers

We have similar case with loggers. Let’s start with base class:

using System;
using System.Globalization;
using System.Threading;
using DomainCommons.Correlations;

namespace DomainCommons.Loggers
{
	public abstract class Logger : ILogger
	{
		private readonly string _loggerId;

		public Logger(ICorrelator correlator)
		{
			Correlator = correlator;
			_loggerId = Guid.NewGuid().ToString();
		}

		public void Log(LogLevel level, string message)
		{
			var segments = new object[]
			{
				Timestamp,
				ApplicationName,
				InstanceId,
				Thread.CurrentThread.ManagedThreadId,
				Correlator.GetCorrelationId(),
				level,
				Correlator.Activity,
				Correlator.GetLogicalTime(),
				_loggerId
			};
			LogWithNewLine($"\n[{string.Join("][", segments)}]\n{message}\n");
		}

		public ICorrelator Correlator { get; }
		protected virtual string Timestamp => DateTime.UtcNow.ToString("yyyy-MM-dd HH:mm:ss.fff", CultureInfo.InvariantCulture);
		protected abstract void LogWithNewLine(string message);
		protected abstract string ApplicationName { get; }
		protected abstract string InstanceId { get; }
	}
}

Method for logging message adds infrastructure details. The only interesting things is why we have two IDs: correlation and logger. This is because Sitefinity internally might create multiple requests for each widget which in fact are completely separate (each widget does job on its own) but are part of one page (and so they should be correlated). You can treat this as a scope or something similar.

We also add fancy new lines before and after the message because it makes parsing logs easier. Since we want to push logs to Azure OMS, we need to parse them in job handler (which will be shown in one of the next parts).

Let’s go with logger for Azure:

using System;
using DomainCommons.Correlations;
using DomainCommons.Loggers;
using Microsoft.Cis.Eventing;

namespace AzureCommons.Loggers
{
	public abstract class AzureLogger : Logger
	{
		protected AzureLogger(string applicationName, ICorrelator correlator) : base(correlator)
		{
			ApplicationName = applicationName;
		}

		protected override string InstanceId => Environment.GetEnvironmentVariable("WEBSITE_INSTANCE_ID");
		protected override string ApplicationName { get; }
		
		public void MissingListenerDllHack()
		{
			// Hack for missing MonAgentListener.dll
			var messageEvent = new MessageEvent();
		}
	}

}

We extract node ID from environment variables which are set automatically by Azure platform. Also, we have a hack for missing file MonAgentListener.dll, which sometimes is not copied to the output directory. This library is required for Azure trace listener and is a part of Azure SDK so if your build server doesn’t have this SDK installed, you need to add the dll manually to your solution and make sure that it is copied to the bin directory.

Now, logger for web applications:

using System.Diagnostics;
using ConfigurationCommons.Configurations;
using DomainCommons.Correlations;

namespace AzureCommons.Loggers
{
	[RegisterManually]
	public class WebAppLogger : AzureLogger
	{
		public WebAppLogger(string applicationName, ICorrelator correlator) : base(applicationName, correlator)
		{
		}

		protected override void LogWithNewLine(string message)
		{
			Trace.WriteLine(message);
		}
	}
}

Web apps have no other way to log anything to Azure infrastructure but to use trace listener so we simply use Trace class. Also, since I register lots of stuff by convention, I mark this logger as registered manually with attribute. This is only a matter of your DI container and conventions.

Web job logger looks a bit different, since it should use TextWriter injected by the framework:

using System.IO;
using ConfigurationCommons.Configurations;
using DomainCommons.Correlations;

namespace AzureCommons.Loggers
{
	[RegisterManually]
	public class WebJobLogger : AzureLogger
	{
		private readonly TextWriter _textWriter;

		public WebJobLogger(TextWriter textWriter, string applicationName, ICorrelator correlator) : base(applicationName, correlator)
		{
			_textWriter = textWriter;
		}

		protected override void LogWithNewLine(string message)
		{
			_textWriter.WriteLine(message);
		}
	}
}

And that’s all.

Summary

We have our loggers and correlators, we are ready to use them in production code. In next part we will see how to create them automatically with various action filters.