In this series, we are going to look at software testing, how we can incorporate that into our CFD coding projects and how we can run tests automatically to ensure that we have a bug-free codebase. In this opening article, we will define common terminology and get an intuitive understanding of how tests are structured and written. We will also see how previous tests that we wrote in other series fit into this testing framework.
By the end of this article, you will know about all the common tests that we employ in automated software testing, how often you should execute which type of test and we will look at a few code snippets to get an understanding of how we can structure automated test codes.
In this series
- Part 1: How to get started with software testing for CFD codes
- Part 2: The test-driven development for writing bug-free CFD code
- Part 3: How to get started with gtest in C++ for CFD development
- Part 4: How To Test A CGNS-based Mesh Reading Library Using gtest
- Part 5: How To Test A Linear Algebra Solver Library Using gtest
- Part 6: How to use mocking in CFD test code using gtest and gmock
- Part 7: What is test coverage and how to use LCOV/GCOV for testing
In this article
Introduction
This series has been a long time coming. If you have followed along in the previous two series, where we developed a linear algebra solver library and a mesh reading library using the CGNS file format. In that case, you will have heard me talking about the importance of software testing. Without realising it, we did follow conventional software testing approaches, albeit not very thoroughly, and we wrote some basic tests to ensure the correctness of the code developed.
In this series, then, I want to look at software testing more formally. If I had to make a pick and choose what I deem to be the most important aspect of modern software development, software testing would be my first choice. Nothing has as much of an influence on the way we write software. If you have never worked with an automated way to test software, you owe it to yourself to give it at least a try and I promise you that you will look at programming differently once you embrace software testing.
There are countless sources of information out there on software testing, so why yet another source of strong opinions and more code? CFD applications tend to be written in an old-fashioned console-based application. That makes things a bit easier for testing, but I invite you to do a quick search on software testing and you will see the abundance of sources talking about testing of mobile applications, website testing, graphical user interface testing, user acceptance testing, and the list goes on.
Testing a mobile application is different from testing a website, which is again different from testing a graphical user interface. And, believe it or not, a CFD solver is very different from Google, so you would naturally write different tests for a search engine website than you would for testing an implementation of the Spalart-Allmaras turbulence model. And this is why this series exists.
Using software testing will incur some overhead; you need to write test code and this will take your time and effort away from writing new features in your CFD solver, but spending that additional time will save you time later. It is akin to switching from Windows to Linux or from MS Word to LaTeX, there is time involved in making the switch but once you know your way around, you get fewer headaches in the future and thus save time
In today’s article, then, I want to introduce software testing and the most important building blocks. We will use this knowledge to formulate a testing strategy in the next article and look at testing frameworks and how to write tests for CFD applications in subsequent articles. Let’s get started and see how we can introduce testing into our codebase.
The different testing approaches
Testing software can broadly be categorised into two different areas; using a manual or automated approach. Both have their place in the software development cycle and we will explore both of these below.
Manual testing
When we write code, we have to ensure that it is working as intended. So let’s say we want to write an interpolation class for cell centroids to face centroid interpolations. We may derive different subclasses from this class to implement different interpolation (numerical) schemes. This is a common operation in the finite volume context, and we typically use interpolation schemes such as upwind, central, MUSCL, ENO, and WENO schemes, to name but a few.
So let’s look then at a function that could implement a central interpolation. The function will accept the value of a quantity \phi of two centroids connected by a face. \phi could be any quantities such as the velocity, density, temperature, etc. We’ll name the values for \phi at the cell centroids phiOwner
and phiNeighbour
. We also receive the distance from centroids to the face centroid, which we call distanceOwner
and distanceneighbour
. We want to calculate the value of \phi at the face centroid. This is shown below schematically.
To calculate the interpolated face value, we first have to determine the distance between the two centroids and then divide both the distanceOwner
and distanceNeighbour
variables by that value to get a normalised distance. This means we can multiply each centroid value by this normalised distance, which now essentially acts as a weight; the closer one centroid is to the face, the more weight it will have in the interpolation. The implementation then becomes:
double getFaceCentroidValue(double phiOwner, double distanceOwner, double phiNeighbour, double distanceNeighbour) {
double distanceOwnerNormalised = distanceOwner / (distanceOwner + distanceNeighbour);
double distanceNeighbourNormalised = distanceNeighbour / (distanceOwner + distanceNeighbour);
return phiOwner * distanceOwnerNormalised + phiNeighbour * distanceNeighbourNormalised;
}
If we wanted to test that this function is working correctly (spoiler alert, it is not, you can pat yourself on the shoulder if you spotted the mistake), then we may write code like the following:
#include <iostream>
double getFaceCentroidValue(double phiOwner, double distanceOwner, double phiNeighbour, double distanceNeighbour) {
double distanceOwnerNormalised = distanceOwner / (distanceOwner + distanceNeighbour);
double distanceNeighbourNormalised = distanceNeighbour / (distanceOwner + distanceNeighbour);
return phiOwner * distanceOwnerNormalised + phiNeighbour * distanceNeighbourNormalised;
}
int main() {
double distanceOwner = 0.5;
double distanceNeighbour = 0.5;
double phiOwner = 0.0;
double phiNeighbour = 1.0;
std::cout << getFaceCentroidValue(phiOwner, distanceOwner, phiNeighbour, distanceNeighbour) << std::endl;
return 0;
}
Since both distances are equal, we would expect an interpolated value of 0.5
, which sits halfway between 0
and 1
, i.e. the values of phiOwner
and phiNeighbour
. We print the value of the interpolation on line 14 and may assert that the code is working correctly, remove the print line and then move on.
This type of analysing the code is known as caveman debugging and reflects the fact that we have better ways of testing available. Despite the insulting name, caveman debugging is a pretty legitimate debugging technique. If you need to quickly go through your code and ensure the code is doing what it is supposed to, there is no problem with that. The problem only starts to arise when this is the only technique you rely on for testing and debugging your code.
The problem with this technique is that a critical bug just made it into our production code and we didn’t spot it, Had we used different distances, say, distanceOwner = 0.2
and distanceNeighbour = 0.8
, for which we would have expected a value of 0.2
now as the interpolated value, then we would have realised that the code is broken, as it would return a value of 0.8
. It turns out, we need to swap the distances, so instead of having:
return phiOwner * distanceOwnerNormalised + phiNeighbour * distanceNeighbourNormalised;
this ought to be
return phiOwner * distanceNeighbourNormalised + phiNeighbour * distanceOwnerNormalised;
You can see that for yourself by making one of the distances smaller. Then, the value of \phi at the centroid closest to the face should have the largest influence on the interpolation, and thus it needs to be weighted by the larger distance of the two. You can show this mathematically as well, but the above is using a more intuitive way.
But let’s say we spotted that issue and correctly implemented the interpolation function, then we are all good, right? What if you later decide to rewrite part of your code because you need to restructure it to make room for some additional implementation? For example, we decided to implement the finite difference method as well and now want to have a modified base interpolation class that has to change as a result of the additional discretisation method. In that case, we may have to touch this code again. And as soon as we do, we need to test it again.
So you say, well, then I just never touch old code again, and I don’t have any problem with bugs creeping up in old code. That is correct, but you are now shifting the problem to a different area. Code that grows needs to be constantly rewritten, this is a process called refactoring.
Refactoring saves you from incurring technical debt, a phrase coined in analogy to a bank loan. With a loan, you can spend more money but have to pay interest to do so. Without refactoring, you get to write new code without touching the old one but eventually, you will have messy code that will slow down your development, as it becomes more complicated to understand even your own code. If you have ever written messy code and needed to look at it again 6 months later, you will know what I mean.
So, testing is an essential part but we don’t want to waste our time doing it over and over again, this is where automated tests come in.
Automated testing
The code we write is not too dissimilar to the code we write when we manually test our code, but the way we organise it is different. Testing now becomes an integral part of your coding workflow and automated tests allow you to run all your tests whenever you want to ensure that changes you made did not break your existing code base, typically before integrating a new feature into your main codebase.
If we stay with the getFaceCentroidValue()
function for the moment, if we wanted to provide some form of automated test for this, we would need to write another function that would simply call this function with some values, for example:
void testGetFaceCentroidValue() {
assert(getFaceCentroidValue(0.0, 0.00, 1.0, 1.00) == 0.00);
assert(getFaceCentroidValue(0.0, 0.25, 1.0, 0.75) == 0.25);
assert(getFaceCentroidValue(0.0, 0.50, 1.0, 0.50) == 0.50);
assert(getFaceCentroidValue(0.0, 0.75, 1.0, 0.25) == 0.75);
assert(getFaceCentroidValue(0.0, 1.00, 1.0, 0.00) == 1.00);
}
The way that a test code typically works is by executing a function that should be tested (also called the system under test, or sut
) and then the value that is returned is compared against an expected value. This is what we are doing with the assert
statement and if all the expected values are correct, then the test function will return without stopping execution.
We have already seen the assert
keyword in several other places in our previous projects on writing a linear algebra solver and mesh reading library, which made sure that certain conditions were met before continuing. We learned that this is the fail-fast approach. Automated tests together with the fail-fast approach give us the best protection against software bugs. If you follow this approach rigorously, you will find it to be very difficult to write code that contains any bugs.
So, if we are writing now additional code to the existing one, we need to separate both of these into their own workspaces. We typically do that by providing a separate test folder in our project. The following shows a very common structure of a project with tests.
project
├── include/
├── src/
└── tests/
├── resources/
├── testType1/
├── testType2/
├── testType3/
└── ...
Within our project, we have our header include files within the include/
folder, while all corresponding source files are located in the src/
folder. The new tests/
folder houses all of our tests. There is a resource folder which you may or may not have, but you may want to put here any files you would need to execute your tests. In our previous series on mesh reading, we wrote tests for simplified structured and unstructured grids, so these grids may be located in the resource/
folder. Anything needed by the tests which are otherwise not used by the code goes into the resource/
folder.
Then we see that we can have different types of tests, which are all separated into their own subdirectory. We will discuss the different types of tests later, but for now, it is sufficient to understand that we write code that tests different aspects of our production code (i.e. the code within our src/
folder).
Behaviour vs Implementation testing
There are now two categories into which we can separate our automated testing efforts. Behaviour and implementation testing. The former is better known as black-box testing, while the latter is known as white-box testing.
Black-box testing
In black-box testing, we assume we know nothing about the code. For example, a user of a mobile app, a website, or a graphical user interface knows nothing about the code, but they are trying to do something very specific on the app, website, or application.
If we had a CFD solver with a graphical user interface and wanted to set up a simulation and run that, then we have a very specific behaviour that we expect from the software; based on the inputs provided, and given that the inputs are all correct, we expect the solver to produce a set of results which we can validate against reference data.
Behavioural tests formalise this process and try to test that the expected behaviour of the code is correct. If you want to write a behaviour test for your CFD solver, with or without a graphical user interface, you can write a behaviour test by providing a test case for which there is an analytic function available (or high-fidelity reference data) and then perform a simulation on it. You can then say if certain integral quantities or RMS errors are within an acceptable range, that your code is working. It exhibits the correct behaviour. We tested the code without looking at any code.
White-box testing
White box testing is the opposite, and in this case, we don’t care about the behaviour of the code overall but rather that the code itself is working as expected. This typically results in us writing test code that captures what we think a function should be doing, and then systematically tests the function for different values.
The test function that we provided above to test that our interpolation function is working correctly is an example of white-box testing. We know about the interpolation function code (and thus can reason about what we expected it to do), and we know its arguments and return type, so based on different inputs we expect it to produce a certain expected output. We used assert
statements to ensure the correct working of the function.
If the interpolation function is working correctly, though, that doesn’t ensure that the entire code base is working as expected. We can have a well-organised test base that checks that our code is doing what it should, but if we then combine these pieces of code in a way that implements incorrect physics, we end up with non-physical results.
As a result, we need both white-box and black-box testing to ensure that our codebase is working as expected. In the next section, we look at the different types of tests we have available and see which ones are white-box and black-box testing approaches.
Types of automated tests
As alluded to above, we now need to choose how we want to test our code. Different approaches will test different aspects of our code, which can be either classified as implementation (white-box) or behaviour (black-box) testing. The list below contains a list of the most common types of tests, but you may come across more for niche-specific applications. For us, the list below will be sufficient to write our own tests.
Unit tests
Unit tests are at the heart of testing and the tests you will write most. They test, as the name suggests, individual units, which simply means testing individual functions or methods within classes. The test that we wrote above for the getFaceCentroidValue()
is an example of a unit test, where we explicitly tested a single function and its return value.
By definition, since we are testing a function and making sure that its inner logic is producing the correct result, we are testing the implementation and thus unit testing is classified as a white-box testing approach. They will also make up the bulk of your testing suite.
Should you test every function with unit tests? The simple answer is no, you shouldn’t, only those functions that are critical and contribute to the observed behaviour of the system or functions that are so simple that a test would be overkill. The observed behaviour for us is that our CFD solvers are correctly solving a particular case, a mesh generator is able to produce a mesh, and so on.
A function that prints the content of a class for debugging purposes (or any operator overloading implementation for operator<<()
) is an example of a function that is not critical for the solver to produce correct outputs. A set()
or get()
function is an example of a function which is so simple, that testing it may be considered overkill. Consider the following example:
double getCFLNumber() { return _cflNumber; }
We can probably all agree that writing a test for this function is unnecessary, as it is very difficult to introduce a bug here while still being able to compile the code. However, sometimes we want to test a different function for which we may need to use set()
or get()
functions to get the right information from a class. In this case, we are still not testing these setters and getters, but rather use them to get the information from the other function we want to test.
Be warned. Tools exist that will give you a test coverage metric, i.e. it will count every line of code your tests are covering, and then compare that against the lines of code that are not tested. This metric, typically measured in percent, is good for finding parts of your code that are never tested, for example, checking that all branches of an if/else statement are tested. However, whenever we get a number in percent there is a tendency to get it towards 100%, at which point we start writing tests that are completely pointless and don’t serve any purpose other than increasing your test coverage.
Let’s look at another example. Imagine we want to write a class to read in an STL file, which contains triangles and their normal vector, which are used to represent geometries. Mesh generators like OpenFOAM’s snappyHexMesh will use this CAD format to generate a body-fitted volume mesh around the geometry. We may have the following (simplified) class interface:
class ProcessSTL {
public:
ProcessSTL(std::filesystem::path filename);
~ProcessSTL();
void readFile();
void fillHoles();
void mergeFreeEdges(double tolerance);
std::vector<std::vector<std::vector<double>>> getTriangles();
std::vector<std::vector<double>> getNormalVectors();
};
Which of these functions need a unit test? The constructor and destructor are automatically called whenever we create an object of a class, so these will be implicitly tested. The readFile()
is critical for the observed behaviour and so we want to test it. In this case, we need to use the two get() functions on lines 10-11 within our test to get the triangles and normal vectors. Once we have these available in our test, we can test that each triangle has read the correct vertices (coordinates) and normal vector, which we can manually obtain from the file itself. Thus, we have an expected and actual value and can use an assert
to ensure both are the same.
The functions fillHoles()
and mergeFreeEdges()
, without going into detail what these functions would do, would also need to be unit tested. The getters on lines 10-11 would not need to have their own test, as we saw above, they will be implicitly tested during the other tests in some way. But even if they would not be tested anywhere, a get()
function does not need to be tested if it consists of only a single instruction/line of code.
The anatomy of a unit test
Since unit tests are so fundamental, and also since they only test individual functions, their structure is fairly predictive and a variety of methods exists to write a unit test. It is best to stick with one, and I will use here the AAA structure, which is a very common unit testing approach. AAA stands for Arrange, Act, Assert. Thus, we have three sections for each unit test. Let’s look at the testGetFaceCentroidValue()
unit test again, which I have copied below for convenience.
void testGetFaceCentroidValue() {
assert(getFaceCentroidValue(0.0, 0.00, 1.0, 1.00) == 0.00);
assert(getFaceCentroidValue(0.0, 0.25, 1.0, 0.75) == 0.25);
assert(getFaceCentroidValue(0.0, 0.50, 1.0, 0.50) == 0.50);
assert(getFaceCentroidValue(0.0, 0.75, 1.0, 0.25) == 0.75);
assert(getFaceCentroidValue(0.0, 1.00, 1.0, 0.00) == 1.00);
}
Currently, this test is doing everything at the same time, which makes it compact, but more difficult to distinguish what inputs are and its expected output. Reformatting that into the AAA style would now look like:
void testGetFaceCentroidValue() {
// Arrange
double phiOwner = 0.0;
double phiNeighbour = 1.0;
std::vector<double> distanceOwner {0.00, 0.25, 0.50, 0.75, 1.00};
std::vector<double> distanceNeighbour {1.00, 0.75, 0.50, 0.25, 0.00};
std::vector<double> expectedValue {0.00, 0.25, 0.50, 0.75, 1.00};
std::vector<double> receivedValue(5);
// Act
for (i = 0; i < 5; ++i)
receivedValue[i] = getFaceCentroidValue(phiOwner , distanceOwner[i], phiNeighbour , distanceNeighbour[i]);
// Assert
for (i = 0; i < 5; ++i)
assert(receivedValue[i] == expectedValue[i]);
}
We see that we made all the required setups in the Arrange section, followed by calling the function we want to test in the Act section, and then we test that the return values of our function under test are correct in the Assert section. Typically, though, we don’t bother with providing a range of inputs and then testing for all of these inputs, a typical unit test would contain just a single value for all inputs, so a simplified version of the above code may look like this:
void testGetFaceCentroidValue() {
// Arrange
double phiOwner = 0.0;
double phiNeighbour = 1.0;
double distanceOwner = 0.25;
double distanceNeighbour = 0.75;
double expectedValue = 0.25;
// Act
double receivedValue = getFaceCentroidValue(phiOwner , distanceOwner, phiNeighbour , distanceNeighbour);
// Assert
assert(receivedValue == expectedValue);
}
We will use a testing framework later and we will see that these frameworks allow us to inject different values for distanceOwner
and distanceNeighbour
, so we can keep our unit test code structure simple while still testing all boundary cases.
The above test code also highlights another thing; the golden rule of unit testing is that we should have only a single statement in the Act
section. If we have to call more than one statement here, it either means we are testing more than one unit (which is not a unit test anymore, but an integration test, see next section), or we have a poor code design, requiring us to call several functions before we can use the function we want to test.
Say, for example, we introduced another function into our ProcessSTL
class we looked at above, and we named the function openSTLFile()
. If we now want to test the readFile()
function, we have to remember first to call the openSTLFile()
function, and thus we would have 2 calls in the Act
section. This is poor code design. Opening a file is part of the class setup, and thus this should be implemented in the constructor of the class, and thus, again, would be implicitly tested.
Integration tests
As hinted at above, integration tests are, by definition, tests that involve more than one function. The rules here are not hard and fast, this could mean that we want to test all functions within a class, or it could mean that we want to test only a few functions within a class and a few functions from another class. The approach taken here will depend on the observable behaviour.
Since we are moving towards behaviour testing, but still look at individual components and make sure their implementation is working correctly, integration tests sit somewhere in between the spectrum of black-box and white-box testing.
Let’s say we are writing a mesh generator which requires an STL file to generate a body-fitted volume mesh around. We could create a geometry of a box with a bounding box of [-1,-1,-1]
and [1,1,1]
. An integration test could now check, for example, that there are no vertices in the volume mesh that are within the bounding box. This region is occupied by the geometry and thus no mesh is allowed to penetrate into this region.
We are not yet testing if the mesh is correct, but we are testing that one component of the mesh creation process is correct, and thus we would label this an integration test.
Another typical example is that of a linear algebra library, something we looked at already. We have a vector and matrix class and we want to make sure that all operators are working correctly. If we want to test that the vector-matrix multiplication is working correctly, we need to test two separate classes in conjunction, and thus a vector-matrix multiplication test would be, by definition, an integration test.
System or end-to-end tests
The highest level we can achieve with automated testing is system tests, sometimes also referred to as end-to-end tests. These are entirely behaviorally driven and test that the observed behaviour of the code is correct. We have already written one system test, which you may not have realised. When we implemented the linear algebra solver library, we wrote a function to test the conjugate gradient algorithm implementation. Specifically, we implemented a 1D heat equation code, which we used as a test.
Given the nature of a system test, it does not know anything about the underlying code it is testing, there is an expected behaviour that this code should achieve and thus system tests are classified as black-box testing.
System tests are small versions of the actual application you are trying to achieve. For example, this may be running a specific test case and ensuring that the output is within acceptable ranges (e.g. lift and drag coefficient for an aerofoil simulation). Since these tests are rather time-consuming to execute, you would only have a few of them, or, run them less frequently than your unit and integration test.
Even if they are time-consuming to run, system tests provide us with the best protection against programming mistakes. We may not be able to pinpoint a problem easily due to its black-box nature, but we can identify quickly if our code is working as expected. We add unit and integration tests to the mix which then allow us to identify the area where our code is producing incorrect results and while bugs still can slip through, we will eliminate most of them.
User acceptance testing
User acceptance tests can be broken down into two phases, though depending on the complexity and size of the application, there may be more steps involved. Even if that is the case, all of these steps can be classified as either alpha or beta testing, and these are discussed below.
These types of tests are typically run just before the release of software and are purely manual. Different developers may have different approaches to it and using a checklist, for example, is not uncommon. The reason we run user acceptance testing is to identify any bugs that were not anticipated because of how the software is used. This is typically the case for applications that feature a graphical user interface, which are notoriously difficult to test with either one of the three automated tests discussed above.
Let’s have a look at alpha and beta testing in some more detail then.
Alpha testing
Once a software project has reached a point at which a new release is planned, all automated tests are run to ensure that the software is behaving as expected. If no more bugs can be identified, then it is time to start to test it as a user would.
Alpha testing happens in-house, meaning mainly the developers are going through the software testing different use cases that they know exist from users. It is a mixture of white-box and black-box testing and if a bug is found, it is patched immediately. As mentioned before, graphical user interfaces are difficult to test automatically, and so this round of testing will cover, for example, whether the graphical user interface is working as expected.
Take ANSYS Fluent for example, their graphical user interface is quite rich and features a lot of different features. If you want to change the properties of the fluid, you go to the material properties and change them. You can reach the material properties from different points in the graphical user interface. If you have set them, and then later decide to change them, and you do so from the button within the boundary conditions, your graphical user interface will break in a spectacular manner (there may be more steps involved, it has been some time I found this bug).
This demonstrates that different users may use the software differently. And if that isn’t covered in any user acceptance test, then it is not discovered and the bug makes it into production code.
Beta testing
After the developers have given the green light and have found no more bugs in the software, it is being released to typically a small group of core users for beta testing. They will then use it as if it were a fully released software and look at the new features to see if they behave as expected for their use cases.
Users typically won’t have access to the source code itself and thus it is a purely black-box testing approach. As a result, once a bug is found, it is documented but not fixed immediately. At the end of the beta testing phase, all bugs are reported back to the developers, which then will be fixed.
At this point, the software is deemed to have passed all testing, there are two options. Either, the software is released immediately to the general public, or, a release candidate is released to a selected group of people. Unlike beta testing, where the software is specifically tested for use cases that break the software, a release candidate is used by people who just want to use the latest version with the understanding that some bugs may or may not still be in the software. These may then still be reported to the developers.
There can be any number of release candidates, but typically one or two can be found in different projects. After the release candidate, if indeed used, is found to be working as expected, the final release is done of the software, at which point, everyone with access to the software can use it. This is also shown pictographically below.
After the release of the software, there is no guarantee that there are no bugs in the software, these will continuously be reported back to the developers, who will then start to work on fixes. Fixes and updates will continuously be deployed and integrated into the latest release. At this point, the developers start to work on new features for the next release, and the cycle repeats itself.
Regression testing and the software testing pyramid
Regression testing can probably be best described as a testing philosophy rather than a separate testing approach compared to the above-described automatic and manual tests. Ignoring user acceptance testing for the moment, regression tests tell us at what frequency to run our tests.
But to have an understanding of that, let’s first look at the software testing pyramid, which is shown below.
At the top, we have our system tests, which are followed by integration, and then finally, unit tests. The extent on the horizontal axis tells us the number of tests we need to have, relative to the other types of tests. We can see that the majority of our tests should be unit tests, followed by integration tests. The least amount of tests we should have are system tests.
The vertical axis shows us the time it takes to execute these tests. Unit tests must execute in a fraction of a second. We want to have hundreds, thousands, or tens of thousands of these unit tests so execution speed is really key here.
If we think about it, it makes sense. If unit tests are supposed to test each function individually and then integration tests are supposed to test different units (functions) in combination, we will have, by definition, more unit than integration tests. And, if system tests cover the entire code base, there may then be substantially fewer system tests than integration tests.
Looking back at regression testing and knowing the time it takes to execute each type of test, it is common to run all unit tests after during the development cycle. Depending on the complexity of the software, we can separate unit tests into separate groups and may choose to run only a subset of the unit tests, specifically those that test the class/classes we are modifying. If one of the tests is failing, it is called a regression (another word for bug), hence the name regression testing.
Once the feature is implemented and the unit tests are no longer failing, integration and system tests may be run. I like to run the integration tests alongside my unit tests as well, and there is no right or wrong here, as long as we run our tests systematically during testing. Once we are happy that the feature we have implemented is finished, we run the system tests to ensure that the observed behaviour of the code has not changed.
Mocking dependencies
So far, we have assumed that all code can be easily tested, although we also pointed out that graphical user interfaces are notoriously difficult to test (that doesn’t mean that they can’t be tested). But there are times were we just can’t provide a reliable test because we have dependencies that can’t be replicated. For console applications like a CFD solver, this is not as much of an issue, but if we look outside our application area, we will find countless examples.
Take for example a weather forecast mobile application. You may want to implement a feature to check if today is the hottest day on record. For that, you need two things; the ability to check the current temperature and a database consisting of all recorded maximum temperatures for the previous decades.
With what we have learned so far, we would write a test that checks what the temperature currently is (using some form of online request, typically using a REST API) and then check what the hottest day was on record in the database. However, this approach is missing the point and is actually pointless as a test.
The chances that today is the hottest day on record are slim, especially if we are developing this feature in winter. We don’t want to wait until summer only for us to be able to run our tests (and even then, it may not be the hottest day). So, whatever part in our code is responsible for updating the database with the new temperature is likely never executed and tested.
Even if we are developing in summer and the day we are testing happens to be the hottest day on record, then we still don’t want to rely on data that we have to request over the internet. This approach takes time and thus our unit tests will be very slow to execute automatically, especially if we perform a few of these web requests.
The other issue is the database, this is a database used in production, i.e. for our real application, and we don’t want to overwrite anything on it based on our tests. We also don’t want to use it as it may, again, be a large database and reading and writing data to it may be too slow for our aim of unit test not taking more than a fraction of a second to complete.
In other words, we have two dependencies in our test and which we don’t want to use. This is where mocking comes in. A mocking object will pretend to be whatever you tell it to be, in this case, a response from a web server, or a class that has access to a database which it can read and write to.
When you make a call to the web service to get the current temperature, you are not passing in the actual web service, but the mocked object that now pretends to be this web service. You tell the mocked object which values to return, for example, whenever we call a function on the web service (or now mocked object) called getTemperature()
. Since we can hardcode this value now, we can test different parts of our code, i.e. we can test what happens if today is the hottest day or not. We get to exercise all parts of the code.
For the database, we may then have a class that has two functions which are getHottestDateOnRecord()
and writeHottestDateOnRecord()
. We can hardcode what the hottest day on record is and with the mocked web service dependency, we know exactly which part of our code will be executed. We can write the hottest day the the database if we want and mock that part as well.
If we want to apply that now to a more CFD-based solution, we have several possibilities. If we say that any dependency is bad and we want to avoid it, then we can mock every dependency in a unit or integration test. For example, if we want to test our vector-matrix multiplication example from earlier, and we decide that this calculation belongs in the matrix class, then the vector class is a dependency and needs to be mocked. There is no need to do so, as the vector class can be cheaply executed, but it is simply a matter of taste.
These two types of testing philosophy can be classified into the classical and the London school of unit testing. In the classical school approach, we allow for dependencies and so in the matrix-vector multiplication tests, we would use the real vector class. In the London school approach, we replace all dependencies with mocks, as long as they are mutable. Mutable is another fancy term for can change in computer science, and here simply means that the object itself can change its internal state over time. We may change the values in the vector, and so it is mutable.
I personally favour the classical school of unit testing, because the point of testing is to exercise the code as much as possible, so why would I mock away dependencies that give me additional protection against regressions? This is a personal choice and you may disagree, but if you do follow the same approach as me, you will see that there will be rarely, if ever, the need to use mocks.
Summary
In this article, we looked at the most common tests that we want to execute during any software development cycle. Starting with manual tests and why they are perhaps not the best choice, we looked at automated tests and how we can integrate them into our development workflow. We classified our tests into white-box and black-box testing approaches and showed that a combination of them gives us the best protection against software defects.
Specifically, we looked at unit, integration, and system (or end-to-end) tests, and saw how many of each we need in relation to each other to perform regression testing, an approach to protect us from unwanted bugs that creep in during the software development. We also looked at user acceptance testing and how this requires manual effort but additional protection against regressions that are difficult to capture in automated tests.
Finally, we also looked at mocking, a concept less frequently required for pure console-based applications but one that may be important to mock away any dependencies that may exist in the code that should not be used during testing.
Tom-Robin Teschner is a senior lecturer in computational fluid dynamics and course director for the MSc in computational fluid dynamics and the MSc in aerospace computational engineering at Cranfield University.