Example-based testing
Example-based tests are easy and simple. People think in examples. When a complex topic has to be explained to someone, we use examples. It is also a great way to easily show how your software works to the people that come after you. The problem, though, is that examples do not always guarantee a complete and safe implementation. Examples are only small portions of the entire input set that some functionality can take. Relying only on constant inputs does not guarantee that your system behaves the same way with other kinds of inputs. This is one of the arguments for property-based testing, but before we go there, let’s look at this sample:
For Arcus, we had to implement our own SQL connection string parsing functionality so we could remove a Microsoft SQL package. This would simplify our dependencies and library in total. We only wanted to know the initial catalog and data source properties of the SQL connection string, so that we could use those when tracking SQL dependencies.
Testing this with an example would look something like this:
🚩 Note that the SqlConnectionStringParserResult
class and Parse
method are only educational here. We will not go over the actual implementation of how to extract the initial catalog and data source from the SQL connection string. All you have to know here is that the `Parse` method does the parsing and that the SqlConnectionStringParserResult
holds the result of the parsing.
Data-driven testing
So what’s the problem here? If the implementation can handle this example, is that enough? An experienced tester would say ‘no’ as the input is seldom what you expect. Only testing simple successful inputs will only guarantee that one input, not the endless other ones. SQL connection strings are built up from key/value pairs separated by semicolons. The key name is case-insensitive and could contain multiple aliases. The order does not matter, the value could contain special characters, and there could be all kinds of whitespace characters polluting each pair. This generates a lot of possibilities that the shown example does not handle.
🚩 We will be using Bogus for any kind of generation from now on.
We need to generate all the available SQL properties, but first let’s look at how we can generate our initial example structure:
These are the key building blocks of randomizing your connection string. This is not complete, though, as we have not yet taken into account all the other types of inputs or aliases. The actual implementation is too much to show here, but here’s the link to the GitHub gist where you can look at it in more detail.
The key thing to remember is that the entire SQL connection string input should be generated. From the character case to the whitespace to the key name aliases to the order in which the items are aggregated together. If you get that far, you will be faced with another problem: how to assert the outcome of the parsing?
Property-based testing
How do you know what the data source and initial catalog should be if you generate the entire connection string? The answer here is fairly straightforward: we used the original implementation from the Microsoft SQL package to check if our parsed results match the results that the SqlConnectionStringBuilder
found. This is called a ‘test oracle’. In property-based testing you never know the exact input. You can only draw conclusions based on the output. In this case, we don’t know what the exact data source or initial catalog value would be after our generated SQL connection string gets parsed, but we do know it should be the same as the ones in the SqlConnectionStringBuilder
.
Now that we can generate our SQL connection string and assert it, we can use it in our test. In the following code sample, you will see that we make use of the xUnit [MemberData]
attribute to generate not one, but 100 SQL connection strings. This is to make sure that we are testing all sorts of variations. As you saw in the generation code, we have a lot of options and variations on property aliases and character values. Using a whole set of inputs instead of just one makes sure that we test ‘enough’ inputs to feel safe about our implementation.
This last code sample is not far away from property-based testing. One of the crucial things missing here is shrinkers. Shrinkers will make sure that on a failed test result, the generated input gets ‘shrunken down’ to the simplest input that causes the test failure. If a test fails in our example, it will still hold the original complex-generated connection string. It may be the last push for people to switch over to an actual property-based testing framework like FsCheck.
Here’s an example of how the generation logic can be passed on to such a FsCheck generator. By distinguishing between regular string
values and SQL connection string, we can make use of a discriminated union.
Conclusion
Property-based testing has a reputation of being hard to read, difficult to understand, and tricky to convince people to use. Often, you see people trying to describe their tests in a system that is more robust, without knowing that they are looking for property-based testing. Data-driven tests could be a bridge to helping us understand why we should consider test properties instead of static data. This example shows the necessary need to test it more thoroughly. Some will stop at data-driven tests, but after some test runs and failing tests during the growth of the project, people will want to have a better defect detection system to determine what went wrong. When this is the case, property-based testing could become the next practice of choice.
Thanks for reading!
Stijn
Subscribe to our RSS feed