12 April 2011

Unit Testing with Moles and Pex - Part 1

This post discusses key ways of improving the effectiveness and code coverage on unit tests, using the Visual Studio Test Framework, including the developing Moles and Pex technologies.  Other frameworks and harnesses such as nUnit, Moq, etc. are not discussed in this post.

Moles is Microsoft's code instrumentation, or mock framework, used to isolate environment dependencies.  Moles was created expressly to enable Pex to function.  Moles is discussed in greater detail, after the break.

Pex intelligently examines the logic branches of the code to be tested, and then generates a series of input values that should or should not fail, thereby maximizing test code coverage.

Before we start talking about Moles and Pex, there are some elementary, yet vitally important topics to review about unit testing.  Adhering to these rules play into the effectiveness of Moles and Pex.  I strongly advise reading, before moving on to part 2, but I really can't stop you, can I? (or can I? Muahaha!)

Unit Testing

Test Scope

Poor scoping of unit tests is possibly the most prevalent offender of good testing.  A unit test should test exactly one condition of one function.  If another condition may result from the same function. a second unit test should be created.

An example of this is when the unit of code produces a string type.  The string may be null, empty, or may contain any combination and number of case-sensitive characters.  Yes, the tests will be identical to the point of the value test, but that middle part can easily be extracted to a common method call.  I'm sure you can already imagine how the number of tests can increase exponentially, and that another critical look at further refactoring and compartmentalizing of code is in order.

Knowing and adhering to the definition of a unit test provides scope guidance.  Unit tests check the state and value of a specific piece of code, using specific input values and conditions, to ensure consistency of the expected result.  Unit tests are not integration tests, deployment tests, black box tests, load tests, or any other type of tests.  Unit tests are:
  • Fast
  • Small
  • Independent of environmental dependencies
  • Free of data and other dependencies
  • Tightly scoped
  • Very specific
  • Combined to provide good code coverage
  • Able to run on any test machine without any changes

Code Coverage

The issue of code coverage is a hot topic in the testing community.  Many test developers claim it isn't practical or feasible to provide 100% code coverage through unit testing.  This may or may not be true.  However, you can't argue that high code coverage combined with a large number of assertions produced a high degree of confidence in the reliability of the tests and their ability to detect errors.  Pex helps testers easily generate and build the numerous tests required to achieve up to 100% code coverage.

Preemptive Exception Handling

I recently had a student that complained about having to continually write unit tests for handling NullReferenceException being thrown.  Specifically, they were confused by the scope of handling these exceptions in a unit test; how can a unit test both test for a null reference and perform another test, without violating the very nature of a unit test (test for only one condition, per test).  The answer is simple: don't write code that can produce a NullReferenceException.  You may think this is an arduous effort, but I'll show you it is easier than you may expect.  (I noted this concept was recently repeated by Jon Skeet, for added credibility.)

Right and proper OOP and unit testing dictates reference objects passed as input parameters should not be null.  This is probably not actually spelled out anywhere, but it is logically implied that input parameters are there to be used.  Otherwise, your code probably needs refactoring.

This is reinforced by the fact that Pex will generate test cases that fail, due to an unexpected null reference value.  Therefore, it stands to reason that null reference should be handled before passing it through a call.

class Foo
{
  Person GetPerson()
  {
    string firstName = "Mike";
    string lastName = null;
    var p = new Person(firstName, lastName);
    return p;
  }
}

public class Person
{
  public Person(string firstName, string lastName)
  {
    // This constructor performs only two major actions: input
    // validation and setting properties.  This is a proper method.
    if (String.IsNullOrEmpty(firstName))
      throw new ArgumentException("firstName is null or empty.", "firstName");
    if (String.IsNullOrEmpty(lastName))
      throw new ArgumentException("lastName is null or empty.", "lastName");

    FirstName = firstName.Trim();
    LastName = lastName.Trim();
  }

  public Person(string fullname)
  {
    // This constructor performs many actions.  This does not lend
    // to unit testing.

    // 1. Validates input
    if (String.IsNullOrEmpty(fullname))
      throw new ArgumentException("fullname is null or empty.", "fullname");
    
    // 2. String manipulation.
    string value = fullname.Trim();
    var names = value.Split(' ');

    // 3. Parsing validation.
    // Verify more than one name exists.
    if (!names.Skip(1).Any())
      throw new ArgumentException("fullname doesn't appear to contain both first and last name.", "fullname");

    // 4. Sets property values.
    FirstName = names[0];
    LastName = names.Last();
  }

  public string FirstName { get; set; }
  public string LastName { get; set; }
}

This code places the burden of validating the strings in the Person constructor.  Although this is technically correct, as far as OOP principles go, we are testing for unexpected null references.  This creates complications when creating unit tests, mocks, and when implementing Moles and Pex.  The bottom line is to simply avoid the issue entirely.

Let's see how we can refactor the Person constructor overload, to be more conducive to unit testing:

public class Person
{
  public Person(string firstName, string lastName)
  {
    if (String.IsNullOrEmpty(firstName))
      throw new ArgumentException("firstName is null or empty.", "firstName");
    if (String.IsNullOrEmpty(lastName))
      throw new ArgumentException("lastName is null or empty.", "lastName");

    FirstName = firstName.Trim();
    LastName = lastName.Trim();
  }

  public Person(string fullname)
  {
    // This constructor is refactored to perform only two actions:
    // 1. Validate input.
    if (String.IsNullOrEmpty(fullname))
      throw new ArgumentException("fullname is null or empty.", "fullname");

    // Assign property values.  (Calling Trim() is a safe call.)
    string value = fullname.Trim();
    if (!ParseFullName(value))
      throw new ArgumentException("fullname doesn't appear to contain both first and last name.", "fullname");
  }

  private bool ParseFullName(string name)
  {
    // No need to validate input, here.  The input was already
    // validated by the public methods.  Private methods should
    // have no need for further validation, unless they, too,
    // retrieve input from an external resource (Web service,
    // etc.).
    var names = name.Split(' ');

    // Verify more than one name exists.
    if (!names.Skip(1).Any())
      return false;

    // Note no exceptions pertaining to the validity of the
    // input are thrown by this private class.  This class only
    // reports whether both properties were set or not.  The
    // exception should be thrown by the caller.  This is simply
    // code refactored out of the constructor overload.
    FirstName = names[0];
    LastName = names.Last();
    return true;
  }

  public string FirstName { get; set; }
  public string LastName { get; set; }
}

Note in the comments, private methods don't require validation, when they only receive input values that have already been validated by public methods.  Your code should not be doing anything to violate the validation, so there is no need to worry about it.  By this virtue, we are able to intercept and handle NullReferenceExceptions, before passing null references into call parameters -- simply use a little refactoring or added code.  (See, I told you it would be easy!  Feasibility is another issue.)

Object Instantiation

I admit, when not developing under TDD, I get lazy.  I do things like:
IEnumerable<String> GetWordsContainingLetterR()
{
  return "The quick brown fox jumps over the lazy dog.".Split(' ').ToList().Where(s => s.Contains('r'));
}

OK, so that's an extreme example. However, it well illustrates how using temporary values defeats unit testing.  There is no way to determine at what point the above code fails, in its current state.  What if it was an empty string?  Will the code fail?  There certainly isn't any effective way to test this case; because there is no way to inject a temporary value.

Please, please, for the sanity of testers everywhere, instantiate your values!  You know this frustration, if you have ever wanted to step though code in debug mode, and were unable to view these temporary values.  The following are examples of good value type instantiation, that greatly aid testing and debugging.  Just do it:
IEnumerable GetWordsContainingLetterR()
{
  var words = "The quick brown fox jumps over the lazy dog.";
  var splitWords = words.Split(' ');
  var returnValue = splitWords.ToList().Where(s => s.Contains('r'));
  return returnValue;
}

bool IsLetterRPresent()
{
  var letterRWords = GetWordsContainingLetterR();
  var returnValue = letterRWords.Any();
  return returnValue;
}

void Foo()
{
  var message = "My favorite blogger of the day is ";
  message += IsLetterRPresent() ? "Jon Skeet" : "Scott Hanselman";
  Console.WriteLine(message);
}

One last item of note: the Foo method can easily be condensed into two lines, each of which calls Console.WriteLine. However, we don't want to write anything, until we know the entire message can be composed without errors.

Correct Use of Assertions

Assertions do as their name suggests -- they assert or ensure a specific condition is true.  Some general rules should be followed, when implementing assertions in unit tests.  This is another area where developers make critical mistakes:
  • Use the correct assertion
  • Use many assertions
  • Only one assertion per unit test (this means many tests!)
  • Use more than one assertion, and therefore another test, when the assertion does not sufficiently test the object state
Use the Correct Assertion
Take a look at the intellisense for Assert.Equals().  It says, "Do not use this method."  You're probably looking for Assert.AreEqual().  Some commonly misused and confused assertions include:
  • AreEqual - Are the values of the value or reference types are equivalent?
  • AreSame - Do both references point to the same heap object?
  • Equals - Do not use this method!
  • Inconclusive - It is important to use an assertion, even when the results can not be verified.  Using this allows tests to be revisited in the future for updating or completion.
  • IsInstanceOfType - Don't forget you can verify the type produced by factories and other methods
  • IsNull - Don't use this on a value type -- it will never be true
As noted in the Unit Test Scope section, only one value test should exist per unit test.  I will repeat this here, because this is of critical importance: only one assertion should exist in each unit test.  If the value must be tested for many different conditions, then create a unit test for each one.  Here's an example:
[TestMethod()]
public void ReadFooValueTest()
{
  string expected = "Foo";
  string actual;
  actual = Foo_Accessor.ReadFooValue();
  Assert.IsNotNull(actual);
  Assert.AreNotEqual(String.Empty, actual);
  Assert.AreEqual(expected, actual);
}

  Packing three assertions into a single unit test means any one of them may fail.  If the IsNotNull assertion fails, the actual object is not tested by the AreNotEqual and AreEqual assertions, which should both also fail. his is a poor test, because:
  • The test name does not describe the nature of the test
  • Only one failure is reported, instead of up to three failures
  • When the test fails, it is not clear which assertion failed, as result of the insufficient test name and multiple assertions
  • Early exit from the test due to an assertion failure leaves other assertions untested

Some developers argue that the above code is valid, because the subsequent assertions must fail when the previous one does.  This is true, but relies heavily on the developer's ability to manually ensure things are in correct order, and does not produce comprehensive results.  The above test should produce three failures, and not just one.  Consider what happens when we change the order of the assertions.  That developer's methodology completely falls apart, and the test results lead to a mis-diagnosis.
Inserad, we should create the following unit tests:
[TestMethod()]
public void ReadFooValueTest_IsNotNull()
{
  string expected = "Foo";
  string actual;
  actual = Foo_Accessor.ReadFooValue();
  Assert.IsNotNull(actual);
}

[TestMethod()]
public void ReadFooValueTest_IsNotEmpty()
{
  string expected = "Foo";
  string actual;
  actual = Foo_Accessor.ReadFooValue();
  Assert.AreNotEqual(String.Empty, actual);
}

[TestMethod()]
public void ReadFooValueTest_IsExpectedString()
{
  string expected = "Foo";
  string actual;
  actual = Foo_Accessor.ReadFooValue();
  Assert.AreEqual(expected, actual);
}

You'll notice that because we are adhering the "keep it small" rule of unit tests, extracting the common contents of these three tests is just plain overkill -- it is possible to refactor too much.  Each test contains only one assertion, and the assertion method is reflected in the test name.  When one or more of these tests fail, all tests will appear, and the test names reveal exactly what went wrong.

On to Part 2 (Moles) > >  COMING SOON! (Hey, I'm a busy guy.)

2 comments:

  1. Is this topic connected with your professional field or perhaps is it mostly about your leisure and kinds of spending your free time?

    ReplyDelete
    Replies
    1. Both.

      I created this Moles reference section on the blog in my free time, but use it (now the Fakes framework) at my day job. I work with a lot of legacy code that use 3rd party libraries and APIs. You generally don't want to alter legacy code much, as it is often a delicate amalgamation of additions, created over many years, and rarely constructed with unit testing in mind. Detouring calls at the framework level is an ideal solution for this situation. Of course, proper stubbing should be used, whenever possible.

      I created this section of the blog on my own time. I saw that many people had trouble finding help in the official manuals and reference published by Microsoft Research, and thought this would be easier to search and follow.

      Delete

Please provide details, when posting technical comments. If you find an error in sample code or have found bad information/misinformation in a post, please e-mail me details, so I can make corrections as quickly as possible.