Stress Testing Algorithmic Code

I keep a “To Do” list stored in a text file. I add sub-tasks to tasks by putting them on the line below the task at a greater depth of indentation. This works for me as I find that whenever I look at any task, I can always think of ways to break it down further. Also, I get to see everything in one place.

A few years ago, I decided to write a tool to sort my “To Do” list:

https://github.com/robert-impey/tree-sorter

The complex part was parsing tree structures of arbitrary depth and sorting each tree and subtree independently.

For example,

II Animals
  2 Dogs
  3 Cats
  1 Fish
I Plants
  b Nettles
  c Oak
  a Grass

becomes

I Plants
  a Grass
  b Nettles
  c Oak
II Animals
  1 Fish
  2 Dogs
  3 Cats

This works fine for my tasks. My list is saved in a file and a scheduled task sorts it periodically.

However, one annoyance was that if the file was was already in order, the program would sort and write out anyway. Another problem, was that I was never very pleased with the algorithm that I wrote. Although the performance was acceptable for the 300 odd lines of my file, I suspect that I can do better. I already had in place a range of unit tests and sample files as fixtures. I could just start by trying to improve the data structures and algorithm at the heart of the program. However, I want to be really sure that I don’t introduce a regression for a particularly tricky tree that I hadn’t thought of.

Fortunately, the input and the output of the algorithm are easy to describe. The input is any tree structure in space indented lines in a text file; the output is those lines in order. I decided to add a stress test to the program. The test would randomly generate trees of a given size and depth. The sorting code would then attempt to put them into order. If the result is not in order, the program halts and prints out the offending input and output. Otherwise, the program would keep looping round (generating, sorting and checking trees) until I killed the process.

while True:
    random_tree_lines = generate_random_tree_lines(
        args.Depth,
        args.Items,
        args.Length,
        args.Alphabet)
 
    tree = lines_to_tree(random_tree_lines)
 
    lines = tree.get_lines()
 
    if not are_lines_sorted_tree(lines):
        print('A tree was not sorted correctly')
 
        print('Input')
        for line in random_tree_lines:
            print(line)
 
        print('Output')
        for line in lines:
            print(line)
 
        break q

The complete listing is at:

https://github.com/robert-impey/tree-sorter/blob/master/stresstest.py

Such a stress test is intended to be a supplement to unit tests. If a tree is not sorted correctly, it should be simplified and added to the unit test fixtures. Then, once the code has been fixed to deal with the tricky input, we know that we can avoid regressions. Generally, random generators are less forgiving than developers when it comes to coming up with tricky inputs.

This stress test will keep running forever if left uninterrupted. Typically, if there is an error anywhere in a system of this level of complexity, it will be found within seconds. The stress test didn’t find an error conditions in the existing code. However, it did take a few iterations to iron out a few off-by-one errors in my code for testing that a tree was in order! This helped me build up the unit tests for that part of the program.

As well as allowing me to make changes to the algorithm at the heart of the program, having a function that tells me if a tree is in order or not allows me to skip overwriting the existing file if the file is already in order:

lines = get_lines(file_name)
 
if in_place and are_lines_sorted_tree(lines):
    print('In place sorting requested but already sorted.')
else:
    # Sorting work...

The complete listing is here:
https://github.com/robert-impey/tree-sorter/blob/master/treesorter.py

Leave a Reply