cfd.university

Templates are one of the most powerful features of C++, which allow you to retain the type-safeness of the language while increasing your speed significantly. In this article, we will look into what they are, but also, why common examples used to explain templates don't really do them justice. We will compare templates against class hierarchies and inheritance and see that templates can shave off a significant amount of time from your CFD simulations if used correctly. Finally, we'll also look into how templates are used in OpenFOAM and compare that with a simplified example.

In this series

In this article

Introduction

We finally made it to templates, probably my favourite thing to talk about in C++. If you have no idea what templates are, then strap in, it's going to be a wild ride. If you have already looked into templates and think they are not that difficult, then either you have just scratched the surface or your name is Bjarne Stroustrup (a good name to remember).

It took me ages to properly understand templates, getting the basics is not difficult, but then getting into the details is a bottomless pit. I once interviewed for a software engineering position and remember discussing templates, I was talking about a somewhat advanced template concept, but the person interviewing me had no idea what I was talking about. So apparently, you don't need to know what templates are, but I would encourage you to get the basics and a bit beyond that, which I am covering in this section. Templates are really useful and, in resource-intense applications such as CFD solvers, templates become a very powerful tool to increase performance.

C++: A type-safe language

Before we start looking at any template code, I want to explore the concept of type safety. In programming, we can have two situations; either our programming language is type-safe or dynamically typed. Type-safe languages enforce a data type on variables, and under normal circumstances, we can't change them. For example, C++ is a type-safe language, meaning once we have assigned a type to a variable, we can't change the underlying type. Let's look at an example:

There are a couple of things to unpack here. On the second line, we say that pi is of type double. So when we try to assign it a sequence of characters on line 3, we get a compiler error saying that we can't convert from a char to a double. And this makes sense, how would you translate a sequence of characters into a floating-point number? There may be use cases, but certainly no common ones, so C++ does not provide any implicit conversion for us. On line 4, on the other hand, we introduce piApproximately, which is of type int, and we assign it the value of pi. Converting from a double to an int can be done, i.e. we simply chop off everything after the dot, and now we have piApproximately = 3. This shows us that C++ can convert between certain data types implicitly, but only if this makes sense. For all other data types, though, C++ will not know what you want to do, and so a compiler error will be raised, not generating your executable.

Dynamically typed languages, on the other hand, do not check for your type. Python is one such example, and in Python, it is perfectly reasonable to do the following:

Now the second line works, i.e. we can just change the type of pi, however, the third line will fail, as there is no operation defined between a floating-point number and a string. Dynamically typed languages allow reusing variables, but this comes at the cost of runtime overhead. Every time we want to do something with our variable, the interpreter (in Python's case) needs to check if what we want to do can actually be done. As we saw on line three above, the interpreter will check if we can subtract a float from a string, and then it will throw an error and correctly say that we can't. This runtime overhead slows our code down, and so if we have performance-critical applications, like CFD solvers, we really want to have a strongly typed language (which is another way of saying type-safe) to reduce runtime overhead. This means all type checking is now performed at compile time, and we are thus slowing down our compilation time.

A first look at templates: The boring (and dangerous) calculator example

With this in mind, we are ready to look at the first use of templates. Let's also go straight into an example you most commonly find: developing a calculator. We'll look into this example, and then I'll let you know why I think this example is pretty bad at explaining what templates are. By the end of this article, I hope to have convinced you of this. Suffice it to say, that people who teach templates with the calculator example probably do not understand templates themselves.

The argument goes that if you want to develop a calculator, you may have code like this (I only show here code for addition, but you get how to extend this example to subtraction, multiplication, and division):

Let's say you also want to add integer addition, so you would add a function of the form int add(int a, int b) { return a + b; }. And why stop there? You could also define a function for floats, and all the integer types such as long, unsigned, etc. We can define different versions of the same function, as long as they are not the same (otherwise we get a duplication of code compiler error). Providing different functions, but with, for example, different types for the function arguments, is called function overloading. To be explicit, we could write the following code:

Now we have two different implementations of the add function, and, depending on the variables that we put into the function call to add(), we will either call the integer or floating point version. We have previously discussed that we want to keep our code DRY (Don't Repeat Yourself), i.e. we want to avoid copying and pasting code, and we now have two very similar implementations. While I find it difficult to see how you could mess up this implementation, in real scenarios, you have more complicated functions, and if you then copy and paste code, you get a maintenance nightmare when fixing bugs in one of the functions but forget to copy the fix to the other function as well.

Code duplication is always bad, and so if we have a way to write the function definition once (as this is not changing) and then provide some mechanism to substitute different types, then we would get cleaner code as a result. And, well, we are in luck, because this is exactly what templates are for. Think about it this way, whenever we define a function (without templates), we can provide some arguments to the function like so:

Here, we have now hardcoded that our function takes 3 arguments of types int, double, and bool, in that order. Whenever we call the function, the values for a, b, and c can (and will) change, but their types always remain the same. So the types are fixed, but the values are variable. Templates extend our function by allowing certain, or all, variable types to be different and to be determined by the function call itself. Returning to our calculator example, this means we want to allow for different types (int, double, etc.) and to do so, we introduce a template on our function argument as shown in the following code.

On line 3, we define a generic type called Type (in the literature and live code, this is also commonly just abbreviated as T) and we use this type as a placeholder on line 4 for both the arguments a and b. If we call this template now with either floating-point numbers or integers, it will work. How does it work, when we said previously that C++ is a type-safe language? Well, what the compiler does is look for all function calls in your code (we only have one at the moment) and then it will figure out which function arguments were provided (in our case both function arguments are floating-point numbers). It will then take our template code and provide the actual implementation for the specific type that we have used. If we call the function with different types, it will provide a new implementation for each of these calls with new types, so the compiler is essentially writing code for you (and it does that much better than ChatGPT)! You can actually notice that by looking at the file size of your executable, it will grow with new implementations.

You are not limited to just one template, you can provide as many as you want. For example, if we wanted to add both floating points and integers together, the above template code would not work, as it assumes that both arguments are of the same type. But, if we changed the function to something like:

Then we could call the function with both floating point and integer numbers, as the revised example on line 5 shows (if you try to call line 5 in the previous example, you get a compiler error saying that you can't have different types). We had to choose a new return type here as well and said that whatever type variable a is, is also going to be our return type. This might not be the best choice, but with our limited knowledge on templates, this is sufficient for the moment).

Let's look at an example to see exactly what the templated function will do. Let's say we call our templated add function now in the following way

How many different (or overloaded) function definitions do we need in this case? We need 4. Did you get all 4? Let's look at all of them to make this clear:

and then calling the function 4 times with 4 different combinations of parameters, the compiler will detect all of them, and then use the template function as a blueprint to generate 4 implementations of that templated function. The compiler will never call the templated function itself, they are just a construct for us programmers to reduce code duplication. Remember, C++ is a type-safe language and thus every function that we call needs to know, at compile time, what the types of all of their variables are. Hence, template code will increase your executable file size (and, depending on your usage, also compile time if you have lots of them).

A closer look: Why the calculator example promotes bad template usage

The calculator example shown above shows, in essence, all there is to templates, but the devil is in the detail. I said that this is a pretty bad example in explaining templates; sure, the basics can be conveyed, but I want to provide a counter-example for a generic calculator class, which works for any type (as long as they are numbers, which we can safely assume for a calculator class) and I can mix and match different types such as floating point and integer numbers. Here it is:

If you looked at this example and then asked yourself, "What is actually different here compared to the first example?", then congratulations, you are on the right track. Nothing is different, it is the same code, and in my view, works much better than the template mess. Why? Because templates add complexity and in this case, it is really not needed.

Furthermore, our template implementation has introduced a subtle but nasty bug that I have ignored quietly. Did you spot it? If you program it, it will take you ages to spot. Thankfully, I have MSc and PhD students who constantly produce this bug, so I am quite good at spotting it by now! Let's explore this bug, to see why simplicity is always king. For this, we need to look at the division case. Below, I show two implementations for the divide function, one using templates (divideT()) and one using doubles (divide()), as well as associated calls to these functions. Remember that C++ can implicitly convert between floating points and integer values, so we can use a function with the type fixed to doubles and C++ will convert any inputs correctly to double for us. We will see in the code below why this is a good idea.

You'll notice that the call on line 10 produces a result of 0! Why is that? Well, we call the divideT() function with the arguments divideT(1, 2), C++ interprets these numbers as integer numbers. Since we defined the return type to be the same as the type of the numbers we pass into the function on line 4, the function call will calculate 1 / 2 = 0.5 and then write that into an integer. The integer is going to be rounded down to 0, and so that is why we get an incorrect result. The workaround is to either type these numbers as floating point numbers (shown in the function call on line 11), or, to provide the type information explicitly to the function call, as shown on line 12.

Note that whenever the type can be inferred from the function call, C++ will correctly deduce it for us, as we have seen up until now. But line 12 really shows how to work with templated code, as we typically would provide the types as arguments to the function as well. Type information is provided between the <> brackets, while the function argument's values are provided between the () brackets.

Towards correct template usage

I hope you can see why this calculator example is a bad example, other than teaching us that we can have variable data types, it is not really fit for purpose. So, with that in mind, what are then some good examples of templates? We'll work our way towards that, but we need to first see how we can apply templates to classes. Just like with functions, we can also add templates to classes, for example, we may have a templated class like so:

In this example, we allow the gradient to have an arbitrary data type called Type. This could be a float or double, i.e. single or double precision value and this may be useful for parallelising our code on a graphics card, where floats typically are more performant than doubles, though the gap in performance between both of them (in the context of graphics card parallelisation!) is not nearly as large as it used to be. You could, of course, also have other data types, such as ints or unsigneds if that made sense, though for CFD applications, it is probably going to be a floating point number if we are talking about gradients.

The example above is, however, just as bad as the calculator example (but we needed to introduce it just to see how the syntax works). Why is it bad? Well, it is better if you define a header file somewhere in your project where you define all your data types, for example, let's say we have types.hpp header file, which looks like the following:

Here, we just say that we want to introduce a custom floating point type, which we call FloatType and has a type of double. In C++, we ought to use the using keyword to define new types, but you may also come across its older brother typedef, which is more of a C way of doing things (and we should avoid it). But for completeness, we could have used it to define the same custom data type as

The #pragma once is a C++ oddity, it essentially tells the compiler to only ever include this file once during compilation, otherwise the compiler may complain that FloatType was already declared if we included the file more than once. We could have also used so called header include guards, but that would go beyond the scope here. But feel free to read up on include guards vs. pragma once to see their use cases and differences (mainly in syntax). With that, we can then go back to our Gradient class and simplify it to:

On line 2, we include now the newly created types.hpp header file, and on line 7, we use the newly defined FloatType for our gradient, no need for templates anymore! Though, it should also be mentioned that our gradient can only be a floating point number now, but this is not really a restriction, in-fact, it is actually better as a template can be anything and that may break your code if you were to pass the wrong template argument to your class. Now we are saying that gradients can only hold floating point numbers, which communicates clearer intent.

A CFD template example: Various RANS turbulence models

Ok, but let's look at a better example. Let's say we want to write an incompressible flow solver using the SIMPLE algorithm. We want to solve for turbulence as well and are content with RANS modelling for now. We are not really interested in the class, but rather, in how we store the RANS turbulence model in our class. If we just had one RANS model available, say the Spalart-Allmaras model, then we may have something like this:

This is simple enough, but eventually, our code will expand, and we want to have an inheritance hierarchy for our RANS model with a base class from which all other RANS models derive. Say we have then a RANS model base class which looks like the following:

On line 3, we defined a pure virtual function getReynoldsStressTensor(). If you need a refresher on pure virtual function, see the previous week's article on handling inheritance in C++. This means, we have generated an interface, and we can no longer create an object from this class, only from derived classes that implement this function. This means, that in our SimpleSolver class implementation, we can't replace line 4 with RansModelBase _ransModel;, this would result in a compilation error, as this class contains a pure virtual function.

So, we finally arrived at a good use case for templates. In order to store our RANS model now inside the SimpleSolver class, we use a template for the RANS type and then when we call the class, we specify which RANS model in particular we want to call, as shown in the following modified code.

On the first line, we define the template parameter now with class, instead of typename. You can use either of them, there is no difference, but some people (like me) like to use the convention that we use typename to indicate primitive types such as ints and doubles, and class if the type represents a custom class.

On line 5, we use our template parameter to define our RANS model. This means that in our solver, we can make a call now to _ransModel.getReynoldsStressTensor(); and, depending on the RANS model we passed to the class, we either calculate the Reynolds Stresses now with the Spalart-Allmaras, k-epsilon, or k-omega model, to name but a few, but you could have implemented any other RANS model here.

To construct a class with a specific RANS model then (say we have the aforementioned SpalartAlamaras, KOmega, and KEpsilon class available as RANS models), we have to use the <> brackets as seen earlier, i.e. we may have code like this:

We store the corresponding variables on lines 3-5 with the auto keyword, which is a mechanism by C++ provided to us to automatically deduce the type from the right-hand side of the assignment operator (i.e. the equal sign =). In this case, the right-hand side shows a type of SimpleSolver, and so our compiler will replace auto with that type. It is a feature that was introduced to C++ in 2011, when the language received a major facelift, and since then opinions seem to be divided somewhat on whether you should use it or not (I actually googled this just now and can only find positive opinions. This is not the feeling I get, though, when talking to other programmers. Maybe my programming friend circle has a rather pessimistic outlook on life ...)

Returning to the SimplerSolver class example, the beauty of it is that we don't care which model is being used to get the Reynolds stresses, and, if we later add more RANS models, we don't have to change the SimpleSolver class. This is another important software engineering principle, referred to as the open-closed principle.

Why templates are good: An example without templates that violates the open-closed principle

To understand this better, let's get rid of our templates and look at what the code within the SimpleSolver class would look like. In the following, look at the code and honestly answer the question of whether you have generated code yourself like this. I can think of countless examples when I have done this in the past, and identifying that we have done (or currently still are doing) this, will help us to improve on this coding style on you will realise that expanding your code becomes a lot easier. The open-closed principle really does have a big impact on maintaining your code, and it will reduce bugs as well.

So let's get back to the code. If we do not use templates, instead of making a call to _ransModel.getReynoldsStressTensor();, we would generate messy code like the following:

Apart from the fact that we had to use 11 lines of code instead of one (talk about productivity and DRY code!), this code is no longer following the open-closed principle; every time we add a new turbulence model, we need to change this implementation. You can tell it is a mess, especially if we wanted to implement every RANS model under the sun, then this if-else statement would just become too long to read, and we are likely introducing just more bugs. I am ignoring the fact here that string comparison itself is a bad practice (if you had a typo in your string, you introduce a nasty bug, as the code is fine, just your spelling is wrong), and we would generally prefer to use Enum types here for comparison.

For completeness, let's also introduce enum then to see how we could improve the above code (but it would still violate the open-closed principle). Remember the types.hpp file that we used before to define our floating-point variable type? Let's use that file again and add the following code to it:

an enum type is a list of identifiers, that each are represented by an integer. In this case, we set SPALART_ALLMARAS equal to 0, and then each following entry will increase their index by a value of 1, i.e. we have K_EPSILON being equal to 1, and K_OMEGA being equal to 2. This allows us to make comparisons with integers (0, 1, 2), instead of Strings ("SpalartAllmaras", "kepsilon", "komega"). To use it in our example above, we have to first include the types.hpp file where we define our class, and then we can use code as:

Now, we are comparing integers, rather than strings, which is less error-prone and the right way to do things in C/C++. So I hope you can appreciate the use cases for templates now and make it your goal to learn more about them. I can't possibly cover everything there is to know about templates, but remember the excellent resource on C++ programming: LearnCPP, my favourite place to read up on fundamental C++ concepts.

One last thing on our above implementation, let's say we later decide we wanted to also support LES and hybrid RANS-LES models, such as DES and SAS. As long as these classes also expose a function called getReynoldsStressTensor(), we could use these as well in our SimpleSolver class, without any modification. You see the power of the open-closed principle now. If we were to try to maintain the if-else nightmare, then we likely would have to change things once we realise we also want to support scale-resolving turbulence models, i.e. LES, DES, and SAS.

Templates help us to achieve clean code, but they also come with a cognitive overhead. Reading your code becomes more complicated, as it is not necessarily clear which template parameters are being used when reading a class, for that, the explicit object instantiation has to be looked up, of which there may be many.

Static Polymorphism with Templates

We have already alluded to the fact that there is such as thing as static polymorphism in the last article on inheritance. This is, in my opinion, where templates really shine. They allow us to do the same thing as inheritance or dynamic polymorphism (using virtual functions and then at runtime determining which derived class is being called, adding a runtime overhead), with some limitations, but by using templates, we drop the cost of the runtime overhead, making the calls to our functions much faster. In the article on inheritance linked above, we looked at an example at the end where this runtime overhead really becomes a burden, i.e. when looping over millions of faces. Static polymorphism (trying to achieve the same functionality as virtual functions just with templates) becomes a better choice at this point. Since template types are resolved at compile time, we no longer need to check at runtime which derived class is called, removing this look-up overhead and ultimately increasing performance.

Example: Writing an interpolation class

Let's introduce an example to make things easier. We want to write an interpolation class, which should have a base class so that we can derive different schemes from it. Using dynamic polymorphism, i.e. inheritance through virtual functions, we may come up with a code like this:

We define a base class on line 3 which only has one pure virtual function. On lines 8 and 15, we provide specific implementations in the form of an upwind and central interpolation/reconstruction scheme. In the main function, we can see that calling either the upwind or central scheme, uses the same interface and so if we added more interpolation or reconstruction schemes, they would follow the same pattern (promoting the open-closed principle).

However, we have already identified that this adds runtime overhead, and in reality, we would first have a loop in time (or pseudo-time for a steady-state solver), and then a loop over space for each timestep/iteration, in which we would call the applyReconstruction() function. If we have 1000 iterations or timesteps, our mesh contains 1 million cells, and we are solving for the 3 velocity components Ux, Uy, Uz, as well as the pressure p, then we would call this function 1000 * 1 000 000 * 4 = 4 billion times! Every time we check if the derived class is either of type Upwind or Central. And now imagine you have about 60 different interpolation schemes implemented (as is the case in OpenFOAM, for example), then you need to check 4 billion times which of the 60 reconstruction schemes should be used.

Is this overhead significant though? There is a good discussion on that over at StackOverflow. In short, the look-up of which derived class needs to be called takes one (clock) cycle. Your CPU is typically rated in GHz, i.e. these days CPUs come with clock cycles around 3 - 4 GHz, meaning they can process 3 - 4 billion instructions per second. So, in the above example, our code execution would, in theory, be only about 1 second slower. However, CPUs these days are rather complex systems, featuring instruction-level parallelisation out of the box. When we introduce virtual functions, the look-up of which derived class should be used, brings the entire instruction pipeline to a halt, so even if the look-up itself is quick, the true cost is much larger than 1 clock cycle since everything else comes to a stop. In the example above, the true cost could be as high as 650 clock cycles, this would bring the runtime overhead to around 11 - 15 minutes, which would be considerable. The discussion above also points out that this is the worst-case scenario for the processor tested, and that the true cost is probably somewhere between 1 second and 11 - 15 minutes for the look-up.

So, how can we improve this with templates? The following example shows that by adding a fourth, templated class, we can still keep our class inheritance as we have seen before with virtual functions, but force the look-up at compile time.

There is a bit more code to go through here. First, we define an abstract base class, which we call ReconstructionInterface here and which has only one pure virtual function called reconstruct. Then, we define two classes as before, called Upwind and Central. Both of these classes inherit from the ReconstructionInterface class, to ensure that both of them implement the same interface, i.e. the reconstruct method. Again, we do this to ensure that every time we make a call to the reconstruct method, regardless of the interpolation scheme, we can guarantee that this function exists and is implemented. We could use the Upwind or Central classes now, as we did before, and this would result in the runtime overhead again. However, we introduce a new class called Reconstructor on line 23, which helps us to avoid this overhead.

This class now has a template parameter called SchemeType, and, this is where things get interesting. Notice how on line 23 we are inheriting from the template parameter itself. So when we are constructing an instance of the Reconstructor class, as we can see on lines 32-33, we specify which interpolation class to use as the template parameter. Then, on line 23, we inherit from this scheme. Since the scheme itself was derived from the ReconstructionInterface class itself, we are guaranteed that the reconstruct() method is implemented within all derived classes, i.e. here the Upwind and Central class (and this is the reason we used this inheritance hierarchy again, just to make sure we have a well-defined interface and each interpolation scheme has the same functions available). And since the type of the interpolation scheme is explicitly known at compile time, as we defined it in the code on lines 32-33, we no longer have to look up which scheme or class to use.

On line 26, we define a function called applyReconstruction, where we use the SchemeType::reconstruct() call. We derived from the SchemeType class, and now want to use the reconstruct() function defined within it. Since we want to call the method without creating an instance, or object, of the class, we call the method directly through the class accessor, i.e. ::, instead of using the dot operator. This avoids an additional memory allocation, however, we could have also used the more familiar syntax with the dot operator as shown below (for just the applyReconstruction() function)

One final thought, inspecting lines 35-36 in the previous code snippet, we see that making the call to our interpolation class, now with the templated version, looks indistinguishable from the call we made before when we used dynamic polymorphism, i.e. virtual functions. So the syntax remains the same, only the construction of the interpolation objects in this case is slightly different, requiring a template parameter. Almost the same syntax, and by adding a bit of template magic, we have reduced a rather significant computational overhead.

Templates in OpenFOAM

I wanted to have a quick look at OpenFOAM as well and see a use case for how they are using templates. In one of our examples above, we looked at using a turbulence model within our SIMPLE algorithm class. This example was not chosen out of thin air, it allows us to directly compare our implementation to the one given in OpenFOAM. We can see that, in OpenFOAM, the SIMPLE algorithm is implemented in the following way:

Ignoring the autoPtr here, which is a precursor to smart pointers (which we will look at in an upcoming article), you can see here that we use a template argument of the type incompressible::turbulenceModel here and call the variable turbulence. Later in the solver, we use this variable to make this call:

This call is the same as what we did before with _ransModel.getReynoldsStressTensor();, however, OpenFOAM's call handles a few more things and applies the Reynolds stresses for us without us having to do anything extra. This is a nice and clean code design, and, in their case, it works for any turbulence models (there are restrictions, as OpenFOAM's SIMPLE solver is steady state only, so no LES, DES, or SAS, but other transient solvers support these turbulence models as well). Since the turbulence variable is a pointer, we also use the arrow notation, i.e. -> instead of the dot to call a function, this is just how things are done in C++ to differentiate between the two.

Summary

This is all that I wanted to cover for templates. I should mention, though, that templates are really powerful in C++, and some codes take it to the extreme and use them everywhere. So much so that you have template arguments for class definitions which have their own, nested, template arguments. While these concepts will definitely improve performance, if used correctly, they add a lot of cognitive strain on the code developer, as it is not always evident which class can be used as template arguments. A golden path somewhere between not using templates at all, and using them for every class, is probably a good compromise.

There are still many concepts on template programming which I can't go into detail, but you have seen the most common use cases. Don't use templates if you just want to have the option of choosing either floats or doubles as your floating point data type, use a typedef instead. But as we have seen in our comparison at the end between static and dynamic polymorphism, there are some really good use cases where template programming can really improve our performance. If you are developing your own codes, and you are using C++, try to spot where templates may be useful and start to introduce them here and there.

If you have a feeling that you have understood all of the template codes in the section, there is likely not much that will scare you when it comes to templates. Just remember, the reason we have to use templates is to provide type-safe code in C++, and this is required so we can know at compile time the type and thus can make optimisations as necessary (and checking that operations can be performed), which means that we get faster execution speed. Use templates for speed and don't shy away from them, they allow us to write clean code and thus you need to know how to use them in your own code.

Tom-Robin Teschner is a senior lecturer in computational fluid dynamics and course director for the MSc in computational fluid dynamics and the MSc in aerospace computational engineering at Cranfield University.