This is a story of a debugging session about six months ago now that occurred just prior to a demo to the customer. It's not as detailed because I've recreated the story from the logs on Slack where I vented my frustrations entirely to myself since everyone else had gone home.

The Problem

We'd been developing some code that day that had not yet been tested. everyone else had left for the day and I decided I wanted to give it a go before I left. I thought it would only take five minutes.

I didn't get very far with the application before it crashed with an exception of type 0xC0000005 which is an access violation exception. Since this is not a catchable exception, all I was presented with was Window's 'Program has stopped working' box and no stack trace.

The Investigation

It was time to fire up WinDBG. I'm very much a novice with WinDBG but it's something I wanted to learn more about so I fired it up to have a look. I used the following command:

.loadby SOS.dll clr;!soe -Create System.AccessViolationException;g;

And I was greeted by the following stack trace:

0:015> !clrstack
OS Thread Id: 0x1280 (15)
        Child SP               IP Call Site
000000001dfee4d0 000007fed30d0ab9 [InlinedCallFrame: 000000001dfee4d0] HalconDotNet.HalconAPI.HLISetS(IntPtr, Int32, IntPtr)
000000001dfee4d0 000007fe91ed7aa9 [InlinedCallFrame: 000000001dfee4d0] HalconDotNet.HalconAPI.HLISetS(IntPtr, Int32, IntPtr)
000000001dfee4a0 000007fe91ed7aa9 HalconDotNet.HTupleString.StoreData(IntPtr, IntPtr)
000000001dfee550 000007fe91ed78dc HalconDotNet.HTupleImplementation.Store(IntPtr, Int32)
000000001dfee600 000007fe921906fb HalconDotNet.HTuple.TupleConcatOp(HalconDotNet.HTuple)
000000001dfee6b0 000007fe9219024f HalconDotNet.HTuple.TupleConcat(HalconDotNet.HTuple)
000000001dfee710 000007fe9218f8c6 HalconDotNet.HTupleVector.ConvertVectorToTuple()
000000001dfee750 000007fe921dd249 VisionGateway.Gateway.Helpers.Halcon.StartCamerasTupleHelper.GetRoiTuples(VisionGateway.Interfaces.Dto.RegionOfInterestDto, System.String)
000000001dfee850 000007fe921db437 VisionGateway.Gateway.Helpers.Halcon.StartCamerasTupleHelper.ToHalconStructure(VisionGateway.Interfaces.Dto.VisionTemplateDto, System.String)
000000001dfee8f0 000007fe921d5690 VisionGateway.Gateway.Wcf.HalconGateway.LoadTemplate(Int32)
000000001dfee9a0 000007fe921d3103 HardwareSettings.Settings.SetVisionTemplate(Interfaces.Shared.PackagingLevelData, System.Collections.Generic.List`1, Interfaces.ProductionManagement.ProductionOrder)
000000001dfeea80 000007fe921d261d HardwareSettings.Settings.LoadVisionTemplates(Interfaces.Shared.PackagingLevelData, Interfaces.ProductionManagement.ProductionOrder)
000000001dfeeb50 000007fe921d09ba HardwareSettings.Settings.LoadTemplatesIntoGateways(Interfaces.ProductionManagement.ProductionOrder)
000000001dfeec10 000007fe921d0084 ProductionManagement.ProductionManager.b__19_0()

This shows that it's the Halcon dll that's throwing the exception. We've had other unusual exceptions thrown by the Halcon libraries so this wasn't a huge surprise. The last method call in the stack that's written by us is GetRoiTuples so I started debugging that. I added logging to dump all the parameters passed into that method and got this:

GetRoiTuples called. regionsOfInterest:
{
  "BarcodeModel": null,
  "CalibrationModel": null,
  "CameraId": 7,
  "CheckoutModel": {
    "CheckoutModelId": 15,
    "DataMatrixType": 4,
    "DebounceOverXFrames": 10,
    "MaxGrayThreshold": 255,
    "MinAreaRequiredToConsiderBoxPresent": 60000,
    "MinGrayThreshold": 40,
    "RegionOfInterestId": 15,
    "StaticData": null
  },
  "DataMatrixModel": null,
  "HalconInspectionType": 5,
  "LayoutTemplateCodeName": "Checkout Left",
  "MatchStringModel": null,
  "OcrTextModel": null,
  "RegionOfInterestId": 15,
  "RoiChildren": [],
  "RoiParent": null,
  "RoiParentId": null,
  "ShapeModel": null
}
workingDirectoryPath:
C:\HMI\WorkingDirectory\

The error is actually apparent here but since I didn't write this piece of code, I didn't know if any of the values here were suspicious. At the time this felt like no help at all, I don't know what I expected.

Next step was to attach the remote debugger. I've read a lot of negative things about the remote debugger online. Many people report problems getting the correct version for their version of Visual Studio but I've yet to have a single issue with it, it's always performed flawlessly for me. This was the result:

This made no sense to me. This wasn't an area I was familiar with but it seemed like safe enough code to not throw. After some more debugging and lots of strange behaviour I realised I was debugging release code. I rebuilt the dll as debug and redeployed. I landed at this line:

This line was causing the exception, and the value in StaticData was null. Now I had an idea and built a test console application in an attempt to recreate the issue. It called the same method in the Halcon dll with the same parameters. This was my application:

And this was my result:

Bingo. So passing a null into the constructor for HTuple creates the exception.

The Cause

With the issue recreated I was happy we could fix it, but where did that null come from? Why was it there?

This is the database containing fixed inspection data. A null in the database caused an AccessViolationException. I don't believe that passing null into a .NET dll should ever cause an AccessViolationException but this isn't our dll so we don't have the luxury of changing it.

Looking back at the debug output I produced from the method at the start shows that this null is visible there, but at that time I had no idea that null for this value was not a valid value.

I left it there for the night and went home, I didn't know what the correct value should be so we discussed it as a team the next day.

Subsequent Action

After some discussion it turned out that the value that was null was not needed any more and could be removed. That was an easy fix but a couple of points remained.

  1. Hours of development time had just been lost to chasing this issue. It was important to not repeat that.
  2. As far as the user is concerned, our application is the user interface for their machine and the underlying Windows system is hidden at all times. Crashing to the desktop for industrial machinery is a disaster. We needed to ensure that future values were validated before being passed into the Halcon method.

The solution was to write a wrapper for creating HTuples which check for an input of null. If the wrapper finds a null value it throws ArgumentNullException which is safely handled by our exception handling code further up the chain. This is far superior to crashing to the desktop.

	/// <summary>
	/// This is a utility class for creating <see cref="HTuple"/> objects. It will throw a <see cref="ArgumentNullException"/> if it receives bad input (a null).
	/// This was created because if we don't pick up on the null input, Halcon will throw an access violation when the <see cref="HTuple"/>'s value is attempted to be used within a procedure.
	/// </summary>
	public class HTupleWrapper
	{
		public HTuple TupleFor(string value, [CallerMemberName] string callerMethodName = null, [CallerLineNumber] int callerLineNumber = 0)
		{
			AssertNotNull(value, callerMethodName, callerLineNumber, nameof(TupleFor));

			return new HTuple(value);
		}

		public HTuple TupleFor(bool value, [CallerMemberName] string callerMethodName = null, [CallerLineNumber] int callerLineNumber = 0)
		{
			AssertNotNull(value, callerMethodName, callerLineNumber, nameof(TupleFor));

			return new HTuple(value);
		}

		public HTuple TupleFor(int value, [CallerMemberName] string callerMethodName = null, [CallerLineNumber] int callerLineNumber = 0)
		{
			AssertNotNull(value, callerMethodName, callerLineNumber, nameof(TupleFor));

			return new HTuple(value);
		}

		// ReSharper disable once ParameterOnlyUsedForPreconditionCheck.Local -- this is the entire point
		private void AssertNotNull<T>(T value, string callerMethodName, int callerLineNumber, string methodName)
		{
			if (value == null)
			{
				string logMessage = $"{methodName} recevied a null argument for from method: {callerMethodName} at line: {callerLineNumber}.";
				SharedLog.Log(Severity.Error, logMessage);
				throw new ArgumentNullException(logMessage);
			}
		}
	}