Tag Archives: diff

Unit Testing XML – Evaluating Diffs

I am trying to test code that merges two XML files. In the unit test that I am attempting to implement, I want to compare the difference between the merge result and one of the XML files (the larger of the two).

This is a description of the file contents:

  • The right-hand XML file has 13 elements underneath the root, while the left-hand file has 4.
  • Two elements in both files are equivalent, so the left-hand version is discarded.

What I’m expecting is that the two remaining elements in the left hand file are merged into the resultant XML, so that in reference to the right-hand file, the merged content has two additional elements underneath the root.

XMLUnit

I’ve used XMLUnit previously to compare generated XML with expected output. In this case however, I am more concerned with evaluating the differences between the source and resultant XML.

XMLUnit has a Diff object – org.custommonkey.xmlunit.Diff – which I realised, after some investigation, doesn’t quite offer a diff in the traditional UNIX diff/patch command sense. It evaluates a document in terms of being identical, similar or different, and holds a message describing the first difference encountered.

This has some use of course, my testcase could look something like this:

...
Diff myDiff = new Diff(originalXml, mergedXml);
assertFalse("Expected differences in XML", myDiff.identical());

The message contained in myDiff here is:

[different] Expected number of child nodes '13' but was '17' - comparing  at /Group[1] to  at /Group[1]

I’m not quite contented with that as a robust unit test. I want to ensure that the two files have specific differences. When I think of comparing two files with a diff, I’m picturing a visual diff:

..and the concept of a patch – that the set of +/- lines differences are collected and made available for inspection/verification.

XMLUnit does have some alternatives that, while not exactly what I’m looking for, I could use and are worth discussing:

DetailedDiff

DetailedDiff is an extension of Diff, which will give me a list of all the Differences in the comparison.

final DetailedDiff diff = new DetailedDiff(myDiff);
assertTrue("Expected a difference in child nodes", 
    diff.getAllDifferences()
        .contains(DifferenceConstants.CHILD_NODE_NOT_FOUND));

..will assert that the comparison has resulted in a mismatch in the number of children between the two XMLs (Javadoc). A few of those assertions could describe the expected differences between the XMLs.

Counting Nodes

CountingNodeTester is another alternative that allows us to assert the total number of elements contained in the XML:

CountingNodeTester countingNodeTester = 
    new CountingNodeTester(38);
assertNodeTestPasses(mergedXml, 
    countingNodeTester, Node.ELEMENT_NODE);

Or alternatively, comparing the counts of the two XMLs (7 additional nodes):

final int countOriginal = 31;
final int countMerged = countOriginal + 7;
CountingNodeTester countingNodeOriginal = 
    new CountingNodeTester(countOriginal);
CountingNodeTester countingNodeMerged = 
    new CountingNodeTester(countMerged);
assertNodeTestPasses(originalXml, 
    countingNodeOriginal, Node.ELEMENT_NODE);
assertNodeTestPasses(mergedXml, 
    countingNodeMerged, Node.ELEMENT_NODE);

XPath

Finally we could also use XPath evaluations to assert the existence or lack of certain structures:

assertXpathNotExists("/Group/PageContainer[6]", 
   mergedXml);
...
assertXpathExists("/Group/PageContainer/External-Group/File[@Location='/Sites/centre/dcita/site.xml']", 
    mergedXml);
assertXpathEvaluatesTo("/Sites/centre/dcita/site.xml", 
    "/Group/PageContainer/External-Group/File/@Location", mergedXml);

So while XMLUnit gives us a pretty good toolset for XML comparisons, I’m still wondering if there’s a more diff-oriented tool I could use.

java-diff-utils

I found java-diff-utils, which looks like it could be a good option for handling diffs the way I’m imagining. Lets have a go!

The sample code on the website shows us the basic usage:

// Compute diff. Get the Patch object. Patch is the container for computed deltas.
Patch patch = DiffUtils.diff(original, revised);

..where original and revised are List objects.

The Patch object gives us a list of Deltas, containing the ‘original’ and ‘revised’ segments. Perfect!

This matches what we have in the image (disregarding the empty elements formatted differently)

Assert.assertEquals(1, patch.getDeltas().size());

The Delta itself contains a list of text lines, so we could potentially verify the list of strings manually.

System.out.println(patch.getDelta(0).getRevised().getLines());

..outputs something like:

[<PageContainer>
, <External-Group>
, <File Location="/Sites/centre/dcita/site.xml">, </File>
, </External-Group>
, </PageContainer>
, <PageContainer>
, <Page Title="">
, <File Location="/Content/centre/dcita/index.xml">, </File>
, <Description>Migrated from previous CMS1 Homepage, </Description>
, </Page>
, </PageContainer>
    ]

A cleaner scenario could involve saving the expected delta text to a file (/src/test/resources/merged_xml.diff), and comparing the file contents to the lines in the actual patch delta

final String target = resourceToString("/merged_xml.diff");
final String actual = 
    listToString(patch.getDelta(0).getRevised().getLines());
Assert.assertEquals(target, actual);

This needs a couple of helper functions to load the diff file into a String, and convert the delta List into a String also.

    private String resourceToString(String filename) {
        StringBuffer lines = new StringBuffer();
        String line = "";
        try {
            BufferedReader in = new BufferedReader(
                new FileReader(getClass().getResource(filename).getPath()));
            while ((line = in.readLine()) != null) {
                lines.append(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return lines.toString();
    }

    private String listToString(List list) {
        StringBuffer buff = new StringBuffer(list.size());
        for (Object o : list) {
            buff.append(((String) o).trim());
        }
        return buff.toString();
    }

Finally…

So in the end, my test case ends with combining the above diff utils snippets:

        ...
        Patch patch = DiffUtils.diff(original, revised);
        Assert.assertEquals(1, patch.getDeltas().size());

        final String target = resourceToString("/merged_xml.diff");
        final String actual = 
            listToString(patch.getDelta(0).getRevised().getLines());
        Assert.assertEquals(target, actual);