Documenting code; why bother and how to do it right

In this article, we will explore the three main components that makeup code documentation. We will explore why providing documentation is important and how it improves the chance that others will start using your code. We also look at situations where writing documentation for the sake of it may lead to pointless documentation and this will be an outlook for a later article where we will look at how to provide a better code documentation approach.

By the end of this article, you should have developed an initial sensibilisation for why code documentation is important; if you believe that documentation is just a necessity but not as important as the source code itself, then think again and read on. We will explore why code documentation is important in this series and provide initial thoughts in this opening article.

In this series

In this article

The art of documenting code

Let’s start this series with a more philosophical question. Why do we write code? Coding is the process of taking some form of input data and using algorithms to work on this data to produce some output. In the domain of CFD, we have our solver settings, boundary conditions, and the computational mesh as input data. The algorithm that we use to transform that data into some useful output data (lift and drag coefficient, for example), is governed by the Navier-Stokes equations.

The process of transforming the input data into output data can be made arbitrarily complex, and all major CFD codes make use of libraries in one way or another. These libraries provide dedicated solutions to problems faced by CFD developers, such as solving a linear algebra system of equations or reading a mesh from disk, which are examples of two libraries we have developed before.

When we make use of someone else’s code, we would expect to be provided with some form of documentation that we can use to learn how to use that piece of code. This may be a reasonable expectation to have, yet 93% of developers are complaining about incomplete or confusing documentation.

If you have ever written a substantial piece of code, say 1000+ lines of code, think about it for a minute. If you had to document it now in a way that others can use it, what would you be writing about? If you have ever faced that question, you will realise that this can be a seemingly easy question with a non-trivial answer.

In my experience, there are 2 main reasons why we have incomplete or confusing documentation, particularly in open-source projects as mentioned in the blog post above.

The first reason is that many developers are not primarily software engineers but rather physicists, mathematicians, engineers, etc. They may have learned how to program and can write code that solves complex physical processes (such as solving the Navier-Stokes equations, for example), but knowing how to write code is just one aspect of software engineering. They probably have never come across good software engineering practices and are unaware of DevOps, so code documentation may be something they are simply unaware of.

The second reason is that those developers who are aware of documentation and know its value, can’t be bothered to write documentation in the first place. It is seen as a burden, something that no one wants to do, and so any documentation is quickly thrown together. As long as the code is documented, we have checked the tickbox and are done, right?

Well, here is some food for thought; documentation is just as important as the actual code. If you find this statement controversial, this probably means that you are not a software engineer, but rather have a background in physics, maths, engineering, etc. From now on, you should include writing documentation as part of writing your code. It is that important and also the reason this is its own full-blown series. Mike Pope, a technical writer who writes documentation for a living summarises this point well

We’ve been known to tell a developer “If it isn’t documented, it doesn’t exist.” […] Not only does it have to be doc’d, but it has to be explained and taught and demonstrated. Do that, and people will be excited — not about your documentation, but about your product.

Mike Pope

This gives us an idea about what we need to do with our documentation, explain, teach, and demonstrate. These ingredients need to be somewhere in your documentation, and this is what we will look at throughout this series. But what happens if we don’t provide code documentation?

You could argue the code that you write is only for you, but even if you write code just for yourself, I would argue you still need at least some basic documentation (and we will look at different types of documentation below). But the main reason you should care about documentation is that other people or users should be able to work with your code and documentation is the first point anyone will look at to get an idea of how to use your code. The lack of documentation is summarised by Mike Pope in the following way:

No matter how wonderful your library is and how intelligent its design, if you’re the only one who understands it, it doesn’t do any good. Documentation means not just autogenerated API references, but also annotated examples and in-depth tutorials. You need all three to make sure your library can be easily adopted.

Mike Pope

So, hopefully, you get an idea of why code documentation is important. We will look at this in some depth in this series. I know documentation is not getting the same excitement as learning how to implement a Navier-Stokes solver or different turbulence models, but it is a necessary tool we need to know about as CFD developers and engineers to communicate our work. We will look at the different types of documentation next, and then in subsequent articles look into these in more detail.

Types of code documentation

Not all documentation is created equally and there are three main types you will find in the wild. These are:

  • A 1-page catch-all readme file: This is the minimum amount of documentation you should provide. It is a single page containing everything a user needs to know to get started.
  • A user guide: For larger projects, you may want to provide a user guide that details how to use your code with tutorials, explanations, and anything else you deem necessary for someone to know if they have never touched or seen your code.
  • Direct code documentation: This is a common type of documentation where we annotate the source code itself. Automated documentation tools will use these comments to generate HTML or PDF documentation automatically. This often leads to the misuse of code documentation tools and as a result bad documentation but, if used appropriately, it can result in clear documentation.

Let’s have a look at these different types of documentation in more detail.

The 1-page catch-all readme file

As mentioned above, as a minimum, each software project should provide a succinct 1-page readme file that covers everything from what problem this code/library is solving and how to use it. Robert Ramey has developed a set of slides that summarises the 1-page documentation process quite well, if you are interested, I have linked to his talk below, given at CppCon 2017.

In his talk, Robert Ramey suggests that we use the following sections in our 1-age catch-all readme file:

  • Introduction – purpose of the library (code): This section describes what problem this library or code is solving. The first sentence should be a description that summarises the entire code in a single sentence. Any following sentences may provide additional background and motivations for this library/code.
  • Motivating example(s) with explanations: Once the reader has an interest in using your library/code, you should show how to use it with examples. The more you provide here that covers all use cases of what your library can do, the more uptake it will receive.
  • Notes: Anything that doesn’t fit anywhere else should go in the notes sections.
  • Rationale: Sometimes you have to make coding decisions to do something a certain way that may not seem obvious. The rationale section can be used to document these non-obvious coding decisions.
  • References: Anything you want to reference should go here. This can be reference papers or other libraries that are used.

I find this list is a good starting point but it does not cover all aspects that I want to include in my documentation, and some sections may not be really relevant in a 1-page readme file. Granted, when Robert Ramey talks about these concepts, he does so in the context of documenting a single class, which may be different from documenting an entire codebase (i.e. a library or solver). So, I typically use the following structure:

  • Introduction: Same as the description above, i.e. a single sentence summarising what this library/code is doing with some additional background information.
  • Installation: This section shows how to install the library code. For interpreted languages such as Python, this may be as simple as downloading the code from a remote repository such as GitHub, but for compiled languages like C++, it is useful to include instructions on how to build the code, potentially with different instructions for different operating systems.
  • Usage/Examples: Anything you want your users to be able to do straight away after installing your library/code should be covered by an example with some additional explanations as required. This section should contain all of these examples/tutorials. These may be grouped into sub-sections if you have a few examples to go through.
  • References: Same as the description above.

To exemplify this, have a look at the pyGCS library, which I developed a while back to compute the grid convergence index (GCI). This package follows the above scheme and provides, hopefully, a clear picture of what the code can do. You will also see that the library is averaging a good number of downloads per month, and it is unlikely that the same uptake would have been possible without proper documentation.

Thus, if you create your 1-page documentation with the section heading provided above, chances are that you will write documentation that will help users to adopt and use your code.

The user guide

The user guide is an entirely different beast. While the 1-page catch-all readme file tries to succinctly summarise the library/code in a format that can be consumed within a few minutes of readings, a user guide may take substantially longer to go through. It is much more detailed and tries to provide more reasoning and explanations, along with detailed code examples and usage.

A user guide may only be required once your project grows to a substantial size. For a small project (1000 lines of code or less), a user guide may be overkill and the 1-page documentation described above should be sufficient. However, for anything larger than that, a user guide may be a good idea.

The structure of the user guide may follow a similar structure to our 1-page documentation outlined above, or it may follow an entirely different format, whichever you find documents your code the best way. Whichever way you adopt, ensure that your documentation explains, teaches, and demonstrates how to use your code so that its usage will be clear.

To follow this up with an example again, I developed a script (and the word script is probably an understatement), that takes some physical inputs such as the inlet velocity, boundary conditions, turbulence intensity, etc., and transforms that into a full OpenFOAM case setup. If you have ever set up a RANS simulation yourself in OpenFOAM and needed to switch the turbulence model, you will understand the pain and appreciate an automated way of generating all input files.

It was supposed to be a simple script initially, but as I added more and more functionalities that went far beyond what I originally intended to do and implement, I figured that the 1-page documentation approach was not fit for purpose anymore and so I decided to write a user guide instead.

This user guide still features the motivation and background for the project, a quick start guide with examples and then it describes in more detail what I mean by policy-driven case setups and what inputs are available in the main case setup file, as well as how to write your own case setups. All of this used to be in the single readme file and just got way too large, especially after I decided that I wanted to add more examples on how to set up your own cases.

So, using a user guide is useful for larger projects where you want to take some more time to go through examples and where the users have to grasp some concepts first in order to understand what they need to do to work with your software.

Going low level: Documenting the source code directly

As alluded to above, documenting the source code is another popular method of providing documentation. It is the most labour-intensive, as the documentation has to be written as we write the code itself, but it is a popular choice since tools are available that can read and process the source code documentation and directly generate HTML or PDF documentation as an output.

It is a good idea, in general, to keep the documentation next to your code itself, in this case, you always remember to update your documentation when you refactor your code. However, most programmers do not care about the documentation enough to put a lot of effort into it and given the amount of documentation required, it can be a tedious exercise to generate high-quality source code documentation.

To make this more real, let’s consider the following example code without any context, and see if you can figure out what the code is doing

std::string formatter(int a, int b, int c) {
    std::stringstream d;
    d << std::setfill('0') << std::setw(2) << a << "-";
    d << std::setfill('0') << std::setw(2) << b << "-";
    d << std::setfill('0') << std::setw(4) << c;
    return d.str();
}

No? Ok, how about the following improved example:

std::string padStringWithZero(int value, int padding) {
    std::stringstream paddedString;
    paddedString << std::setfill('0') << std::setw(padding) << value;
    return paddedString.str();
}

std::string formatDate(int day, int month, int year) {
    std::stringstream date;
    date << padStringWithZero(day, 2) << "-";
    date << padStringWithZero(month, 2) << "-";
    date << padStringWithZero(year, 4);
    return date.str();
}

How about that? Hopefully, we can agree that the second option is much clearer. We essentially take in a day, month, and year, and want to output a string in a specific format. Our CFD solver may need to log information into a console or log file and we may want to have a unified way of printing this.

How would we document this? Let’s look at the first example, we are using here a code documentation syntax that doxygen can understand. Doxygen is an automatic documentation tool that is particularly popular for C++-based projects. We ignore the specific syntax for the moment, although it should not be difficult to guess what the documentation is doing here.

/// A function that takes in a day, month, and year, and returns a formatted date
/** 
  * \param[in] a A given day
  * \param[in] b A given month
  * \param[in] c A given year
  *
  * \return A formatted date
  */
std::string formatter(int a, int b, int c) {
    std::stringstream d;
    d << std::setfill('0') << std::setw(2) << a << "-";
    d << std::setfill('0') << std::setw(2) << b << "-";
    d << std::setfill('0') << std::setw(4) << c;
    return d.str();
}

Ok, so the documentation helped here, but as we saw before if we just spend some time making the code readable, then the additional documentation will not add any value. Let’s repeat the code documentation here

/// Function to pad any integer value with leading zeros and return that as a string
/**
  * \param[in] value The value that should be padded with zeros
  * \param[in] padding The total length of the string that needs to be padded
  *
  * \return The padded value as a string
  */
std::string padStringWithZero(int value, int padding) {
    std::stringstream paddedString;
    paddedString << std::setfill('0') << std::setw(padding) << value;
    return paddedString.str();
}

/// A function that takes in a day, month, and year, and returns a formatted date
/** 
  * \param[in] day A given day
  * \param[in] month A given month
  * \param[in] year A given year
  *
  * \return A formatted date
  */
std::string formatDate(int day, int month, int year) {
    std::stringstream date;
    date << padStringWithZero(day, 2) << "-";
    date << padStringWithZero(month, 2) << "-";
    date << padStringWithZero(year, 4);
    return date.str();
}

Looking at this code documentation, you should immediately feel that something is not right. Look at the code between lines 22-28 again, and then read the function definition on line 14, as well as the input parameter on lines 16-18. We are doing a lot of repetition here! Granted, we are not repeating code, but we are the code in the documentation. This is, unfortunately, a rather common type of code documentation encountered in the wild, and it is not very useful.

To see why this isn’t useful, consider the following example: we are writing our code documentation using the same pattern as shown above, and finally arrive at the following function:

float q_rsqrt(float number)
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;
  i  = 0x5f3759df - ( i >> 1 );
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) );

  return y;
}

So, we finished writing this function and then provide the documentation just as we did before

/// Compute the square root of a function
/**
  * \param[in] number A number of which to compute the square root
  *
  * \return THe square root of the input number
  */
float q_rsqrt(float number)
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;
  i  = 0x5f3759df - ( i >> 1 );
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) );

  return y;
}

Great, job done, we documented the function, and move on, right? Well, look at the code and tell me if you immediately spotted that this is calculating the square root. And, if you did, did you also notice that this is a very inexact version of calculating the square root of a number?

This function is not at all trivial, and some additional explanation is required. Why do we write our own square root algorithm in the first place? And why does it work this way? These are all information we usually miss if we simply annotate the code for the automatic documentation tool to generate the documentation. So there is a danger here if we document the code without thinking about the purpose of the documentation.

A more useful description here would have been a link to the algorithm itself, and some additional reasons why we compute the square root ourselves. In this case, it is a much faster way to compute the square root than using std::sqrt(number);, but it is also not as exact. However, there are applications where an approximation is preferred over an exact value if the computational cost can be reduced significantly, and this should be captured in the documentation.

Remember the words of Mike Pope given above? Documentation should explain why we are doing what we are doing, not simply annotating the code. We will look much more into detail of providing code documentation in a later article and I will show you what I believe to be a better solution to the current problem.

Summary

This concludes our first article on code documentation. It is not hard to write good documentation, but we need to know which type of documentation si best suited for which type of code and how to structure it.

We most commonly use the 1-page readme file for smaller projects and even for larger projects to provide a quick overview about what problem the code solves, how to install it, and how to use it. It should provide a quick start to using the code and do not much more than that.

If we decide that we need some more in-depth discussions, explanations, or tutorials, then we may want to supplement the code with a user guide. Here we have some more space to explain more complicated concepts and how to use the code. This approach is useful if providing input data is not that straightforward and requires some additional discussion.

Finally, we can opt to annotate our source code directly which will act as code documentation in place. This is a common approach adopted by larger projects, unfortunately, it does not always lead to usable documentation and quite often, the generated documentation is rather useless.

To avoid writing excellent software that no one adopts because the documentation is not helpful, we will look at a few tools and techniques that will build upon the three different types of documentation we may encounter. Following the steps outlined in this series will help you provide documentation that is useful and serves its purpose of documenting your code, explaining what it does, which ultimately results in better uptake of your code.


Tom-Robin Teschner is a senior lecturer in computational fluid dynamics and course director for the MSc in computational fluid dynamics and the MSc in aerospace computational engineering at Cranfield University.