cfd.university

Classes and structures are what make C++ so powerful compared to its predecessor C. While classes are, broadly speaking, just a logical collection of functions and data. This data is used by these functions, and so classes and structures offer no obvious direct benefit (other than organisational benefits). However, they do allow for a different kind of programming (object-orientated programming) which fundamentally changes the way we design and implement code. This article will showcase the essential features of classes and structures and make you an object-orientated programmer if you have not already embraced this paradigm.

In this series

In this article

A motivating example

In this article, I want to take a different route of introducing classes and structures, compared to what you would normally see in the literature. So if you already know something about classes, hopefully, this will provide you with a fresh and new view of this topic. For the moment, we will only concentrate on classes, and then later extend our knowledge to structures, and you'll see that they are pretty much the same, but we use them for different things. Suffice it to say, that what we are going to discuss about classes also holds true for structures.

The best way, in my opinion, to understand classes is by coming up with an example where they would be useful. Let's say we wanted to write our own linear algebra library. At some point, we may want to do some operations with a matrix, as shown in the following example.

We first define a matrix type on line 4, which is just a two-dimensional array consisting of two std::vectors. On lines 6, 10, and 14, we define three different functions of operations that we may want to do to our matrix. Now there is one thing programmers are allergic to, one thing that every programming language wants to avoid, and one thing you'll learn on day one-ish of a computer science degree; DRY.

Avoiding code repition

What is DRY? It is an acronym for Don't Repeat Yourself, in other words, never copy and paste code, if you find yourself doing it, you must stop. DRY is also referred to as knowledge duplication. If you have to write the same code twice (or copy and paste it), you are duplicating the logic that the code implements and thus duplicate knowledge. Once you change one implementation, the other implementation becomes outdated and you need to remember to fix it, if you don't, your application likely still runs, but it will produce inconsistent results at best and plain wrong results at worst. Programming languages provide us with features to avoid code repetition and we can extend this concept to our semantics as well.

On this note, you have to applaude the computer science community for then coming up with the acronym WET, which is supposed to be the opposite of DRY code, i.e. lots of repetition. Since the acronym was formed first, there are many interpretations for what WET stands for, including write everything twice, write every time, we enjoy typing, and waste everyone's time. Don't be wet, be dry!

If we look at the sample code above, we can see that all function names contain the word Matrix, and all functions accept, as one argument, the matrix itself (and note how the matrix is call by reference while the other arguments, located on the stack, are call by value. If you need a refresher on the stack and the heap, and call by reference and value, look at my write-up on memory management in C++). The naming of the function is somewhat unfortunate, but we can live with it if we have to. But the function arguments are something else.

Let's say we realise later that actually there are some matrices which are always three by three, i.e. transformation matrices in three-dimensional space, or tensors. Of course, we could use the current Matrix type that we defined before, but since this is using std::vectors, its memory will always be allocated on the heap, and for a three-by-three matrix, which comfortably fits on the stack, we may want to provide a different type which fits on the stack to boost perfromance. Then, we may want to introduce a new type and fix the size of the matrix at compile time, and for this, we have a new data type available to us; the std::array:

Since we already reserved the name Matrix for our dynamically allocated matrix, we had to come up with a new name, here Matrix33, but you could have named it any way you wanted. However, while this will avoid naming conflicts, you may argue that having just a Matrix type now is confusing, as it doesn't really communicate that this is the dynamic version, and you want to have clear name separations of your variables, so you decide to rename it, i.e. you now could have something like this:

But now we have changed the type name, and so we have to change the type names everywhere in the code. See, we just repeated ourselves by typing out the type name in each function, and so our code isn't DRY. I know this seems pedantic, but I want to prime you for these situations. Sometimes you will also hear this type of issue in code referred to as bad code smells, i.e. code that works but which likely will become a mess as it grows. Imagine we had implemented a bunch of functions in different files where the matrix object is needed, then you would have to change it everywhere, and we just want to avoid this as much as possible.

And, to satisfy the super pedantic readers, yes, there are even better ways to deal with this particular situation, but I don't want to go down that rabbit hole. But if you are interested, many companies provide their own implementation of the std::vector class, where memory is allocated on the stack if it is small enough, and moved to the heap once the size becomes too large. Facebook, (or meta?), for example, use a small_vector class, that behaves very similarly to a std::vector but optimises memory usage on the stack where possible.

Classes to the rescue

So, this leads us naturally to classes and structures. A class, essentially, is a collection of functions that operate on the same data. In this example, our class would contain the matrix itself, and then each function within the class would have access to this matrix and be able to manipulate it. Let's flesh out the same example now (just for the dynamic matrix version, i.e. the one using std::vector, but we'll learn better solutions later when we discuss templates. Also, I should stress at this point that you would not actually use a 2D std::vector here to store an actual matrix of this type in a CFD solver, as most entries will be zero. We would prefer a sparse matrix storage type, such as Compressed Row Storage (CRS) and I'll likely come back to this later and show how to write our own linear algebra package using this type of storage):

Let's go through this in more detail. On line 4, we declare a class called Matrix, which goes all the way to line 23 (and notice how each class definition is finished by a semicolon). We ignore the keywords public and private on lines 5 and 19 for now and come back later to them. On line 6, we declare a MatrixType as we have done before, this allows us to reference the type later if we need to. We define then a matrix on line 20. On lines 21-22, we also specify the number of rows and columns our matrix should have, and you'll notice that variable names now start with a leading underscore.

The opinions in the C++ community are somewhat divided on the leading underscore, technically you can name your variables anything you want, but within the core implementation of the C++ runtime library, variable names are declared with two leading underscores. The argumentation goes that a leading underscore for your name may be confusing and could be mistaken for C++'s internal variables, and the argumentation goes that you should have then a trailing underscore, rather than a leading underscore (i.e. instead of _matrix, use matrix_).

Some also advocate a different naming, starting with a leading m_, i.e. we would have now m_matrix, where the m refers to a member of the class. I have, however, decided to stick with the leading underscore in my code for a simple reason; if you look at other programming languages, a leading underscore seems to be the norm. It is just us C++ programmers who have too much time to have serious discussions about leading vs. trailing underscores. There are some languages where a leading underscore is actually required to make variables private (again, we'll get to that), so, out of habit, and to stick with conventions pretty much used everywhere else as far as I can see, I'll be using leading underscores.

Class types

Let's jump to line 29 for the moment, we see that we are using our matrix class here. And, if we compare that to, say, line 28, we see that each variable declaration follows the form Type Name, i.e. first we specify the type (either unsigned (line 28) or Matrix (line 29)) and then the name of the variable with that type (either norm (line 28) or matrix (line 29)). From this, we can understand that classes are nothing else than custom types.

C++ provides us with built-in types such as int, unsigned, float, double, bool, and so on, and classes allow us to craft our own types. In object-orientated programming, every time we create a new variable with a custom type (like the Matrix type here), we say that we instantiate an object based on the class. A class is sometimes also referred to as a blueprint, because it outlines the structure of the type, but then we can have many different objects based on this type, i.e. we can have different matrices of different dimensions.

Creating and deleting objects: The constructor and destructor

When we create a new variable with our custom type (here the Matrix class), there is a special function that is called; the constructor. Our constructor is declared on line 8, and you can always spot the constructor, as this is the function that has the same name as the class itself. You see that the constructor accepts two arguments here, and we use them to initialise the state of our class. In our example, we simply pass the arguments for the number of rows and columns, and then we store them in the variables we declared on lines 21-22. We can also see, that we pass these arguments to the class on line 29 when we instantiate our object.

Related to the concept of the constructor is the destructor, and this is provided on line 13, it still has the same name as the class itself but has a leading ~ to indicate it is the destructor. The destructor is called every time a variable is deleted. The idea is that you set up the state in the constructor (i.e. allocate memory) and then in the destructor you deallocate any memory as needed. In my view, the destructor has lost its significance somewhat as C++ now offers data types and containers that do all memory allocation for us (correctly on the stack or heap, we just need to use the right container for the task we are solving). Thus, we should never really do any memory management ourselves, unless we know what we are doing and doing so because of performance optimisation (i.e. we may want to lay out our memory differently for certain data types so that we reduce cache misses, but this is really such an exotic example that you likely never have to do this yourself!).

However, there is one use case for the destructor that I can think of, and that is working with files. If you are reading or writing files, and you want to create a custom class for that, then you would open the file in the constructor, provide functions that can interact with the file and then close the file in the destructor. I don't have any other use case for the destructor these days. Again, if you find yourself using the new keyword in the constructor and the delete keyword in the destructor (for which both were originally (ab)used), think again. You can entirely avoid manual memory allocation in this way using either containers such as std::vector or smart pointers, and we will look at both of them in more detail in later articles.

Just to finish off this thought, since we have to do all memory management ourselves in the constructor and destructor (if we have to), memory management does not come automatically for us and garbage collection remains the task of the programmer. This is a good thing, though, by explicitly stating when to allocate and when to deallocate memory, we are taking that responsibility away from the C++ runtime library and also knowing this in advance means we can ask the compiler to optimise this if there is anything to optimise.

Functions in Classes (methods)

The functions we had before now live inside the class on lines 15-17 and, since all of them now are part of the matrix class, no longer contain the word matrix in them. You'll also notice that the argument to pass in the matrix has gone now. In C++, functions that live inside a class are referred to as methods, while the data that is located within the class is referred to as members. I have, however, seen just as many people using the work function out of habit when talking about methods and so I'll stick to this naming convention as well.

On lines 31-33, you can see how we are calling the functions of our class. We use the so-called dot operator, that is, we use the variable name, followed by a dot and then the function name (i.e. object.function()). This is the case for objects that are allocated on the stack. If we declared our object as a pointer, then we can't use the dot operator anymore and use the arrow operator instead, i.e. object->function(). There is another exception, and that is if our functions are static (static functions can be used directly from a class without having to instantiate an object, there are some good use cases why we would like to do that, but that is a bit beyond the current scope). If you ever have a static function in your class, then you use the double colon operator, i.e. class::function(). Notice that we are not using the object here but rather the class directly.

Just look at the code on lines 31-33 again, I think we can all agree that this implementation states clear intent and makes the code very easy to understand. The example code from our motivating example was not necessarily more difficult to understand, but as your code grows and as you introduce more and more classes, having a simple syntax like this will make your code more readable.

Restricting access: private, protected, and public access modifiers

Now, I want to come back to lines 5 and 19, respectively, where we declare two new keywords; public and private. These are referred to as access modifiers and tell the compiler who will have access to certain aspects of the class. This means we can control what the class should be exposing to anyone interacting with the class, and what we would like to remain hidden. As you probably have guessed then, if we declare a section public, any code that follows underneath this declaration, i.e. here from lines 6 to 18, is visible to anyone using the matrix object, and thus they can call it. For example, our functions on lines 15-17 are within this public scope, so we can call them, as we do on lines 31-33. If we declared these functions private (i.e. if we inserted a private: statement on line 14), then you would get a compiler error such as: error: 'void Matrix::transpose()' is private within this context and your code would not compile (this is the error message the GCC compiler would give).

On line 19, we switch the visibility to private, meaning we can't make calls in the code such as matrix._matrix (there are certain situations where you can, but again, for simplicity, we'll skip over that for now). The compiler would produce an error again, though if you declared that section public, your compiler would again allow you to directly access the variables of the class.

So why bother with private and public? Well, let's take an analogy, if you administer a network at a company, you want different people to have different access rights to your system. IT people may have access by default to servers and more secure network locations whereas an employee may only have access to a shared drive, but certainly not to security-related servers and their settings. It just comes down to security, you don't want people (or code) to accidentally change things they have no business changing.

Let's go back to our code and let's say we declared line 19 as public, now everyone can access the variables that make up the matrix class. If someone then decided to make a call to the nature of: matrix._numRows = 5000, that would be legal. But then, later when you make a call to say matrix.transpose(), where you have to loop over the entire matrix, now you would loop over 5000 rows, instead of 1000 rows as specified in the constructor. However, during initialisation we only allocated memory for 1000 rows and so your code now tries to read memory which isn't there and, as a result, will crash. This code was working perfectly fine before but due to an incorrect usage of the class it is now crashing and you have to figure out why. This is a somewhat constructed example, but you get the idea, we don't want to be able to accidentally change variables and their values to something that will break our code.

Hiding implementation logic: private functions

Related to that, we may want to implement some functions that should be only available to the class, but not the code calling the class. For example, the initialise() function is an example of a function that should not be exposed to the caller. Why? if they forget to call it, no memory will be allocated and when you try to use your matrix object you'll get an error message and you, personally, get a lot of angry open issues on GitHub of people telling you that your code is broken. This is not the case, they haven't called the function, but really, it is not their job to look after memory allocation, it is the class' job to do that. So, it would be more natural to declare the initialise() function private, and then call it automatically from the constructor, to ensure memory is always allocated, i.e.:

Now we declare the initialise() function private by putting it on line 11, and then we called it as the last instruction of the constructor on line 7. Your users are happy, and your GitHub issues remain at zero.

A key concept in object-orientated programming: Encapsulation

Segregating your data and algorithms (functions) into visible and hidden states (public and private) is a fundamental concept in object-oriented programming and is referred to as encapsulation. We want to hide all implementation logic (this includes the data, i.e. variables) from the caller. All the caller of the class should be able to do is call functions that they are interested in, such as calling the transpose function or calculating the norm of the matrix in our example. They are not interested in allocating memory, and that should be handled by the class itself. A general class structure may look like the following

Generally, we use the constructor to set up the class so that after the object has been created, it is ready to use. In our matrix example above, we called the initialise() function from within the constructor which would take care of memory allocation and setting all entries to zero, initially. After that, we are ready to use the class. Or, if we are using a mesh reading class, we could implement all logic to have the mesh read in different private functions, which are then called in the constructor. The only thing the caller of the class has to do is to get the part of the mesh they are interested in, for example, the coordinates or boundary conditions. In this way, the caller would not get an error message if they had forgotten to call a few functions before trying to get the coordinates or boundary conditions.

Encapsulation is one of the most fundamental principles that object-orientated programming exposes to us, making good use of it means that we write clean and easy to (re)use code, and in a way, that follows our DRY principle. Write the code once, and then reuse it in other projects again and again.

Classes vs. Structures

Finally, let's get back to structures, or structs as they are known in C/C++. Classes and structures are the same in C++, as alluded to in the beginning. The difference is that classes are, by default, hidden (that is, they have their default access modifier set to private), while structures are, by default, visible (that is, they have their default access modifier set to public). Consider the following code:

Both the class and struct definition are equivalent in this case, as we set the access modifier to public as the first thing in the class definition. Equally, we could have set the access modifier to private in the struct, and this would have been then equivalent to the behaviour of a class. The reason we have both classes and structures is that C contains structures, but these were only really used to group together common data types. For example, if we wanted to create a three-dimensional vector in C, then you could argue that the x, y, and z components all belong to a vector type, so you would, for example, generate a vector like so:

On line 1, we declare a Vector structure which contains the x, y, and z components, which we can now directly access when generating a vector object from this structure, as shown in lines 7-10. This also shows how we would typically use structures and classes. A class contains implementation logic, i.e. functions, that can manipulate internal variables, as we saw in our matrix class. We could add support for matrix-matrix or matrix-vector multiplication, which would be a common use case for manipulating the internal state. Structures, on the other hand, are typically only used to store raw data and group them into logical units, like in the case of a vector. Another example may be a cell face, where we would store a normal vector and the area.

Some example classes

Before we finish this article, I wanted to provide some examples of classes you may encounter, and also provide some idea of what data you would store, as well as what functions you would implement, all related to CFD code you may write. The following contains a list where we first give the class name in bold, followed by a description of what it does. It is followed by a list of functions in bold as well as variable names in italics, indicating what functions and data types we would store.

Mesh reading class

When we write a CFD solver, we need to either generate a mesh of some description, or read it from file. It only makes sense to put all of that logic into its own class. An outline for this is given below

Gradient calculation class

The Navier-Stokes equations are given in differential form and thus calculating gradients (and divergences) is a mandatory step. Even if we reformulate the Navier-Stokes equations into an integral form, as it is commonly done to use the finite-volume approximation, we still need to evaluate gradients for the viscous terms. In this case, we may want to implement a gradient calculation class similar to the following outline

Linear algebra class

We have looked at the Matrix class example extensively in this article, but we probably want to use some form of linear solver on the matrix to calculate the solution vector. An outline for a linear algebra solving class may look like the following

Remarks on class structures

In the above examples, I have put both gauss() and leastSquares(), as well as CG() and GMRES() in the same class. In reality, we would not do that, but for that, we first need to introduce the concept of inheritance, which we will look at in the next article. But I hope you start to see a pattern and how you would use classes now in your code design. We will talk later about how we would structure classes, but for now, this should be sufficient to get you started with object-orientated programming.

Summary

In this article we looked at classes, and its extension to structures. We looked at the basic building blocks that make object-orientated programming so powerful. In particular, we looked at two principles, DRY (don't repeat yourself) and encapsulation, and we looked at how both of these help us write clean code.

Object-orientated programming is not just important for C++, but other languages have embraced it as well and probably this is the single most important thing to learn in programming if you have not done so already. If you ever want to work on a CFD solver, be it OpenFOAM, SU2, or any other in-house solver, chances are you have to work with classes and C++, and if you understand classes in C++, you can use that knowledge in other programming languages as well.

There is so much more to explore here and we will work our way through some more important features. In the next article, we will look at another important design principle; inheritance.

Tom-Robin Teschner is a senior lecturer in computational fluid dynamics and course director for the MSc in computational fluid dynamics and the MSc in aerospace computational engineering at Cranfield University.

Object-orientated programming in CFD