cfd.university

In this article, we look at why you should care about libraries/dependencies in C++ and how they will make your life easier as a programmer. See this article as an introduction, which introduces the basic building blocks. We look at static libraries, dynamic libraries, and header-only libraries, their use cases, as well as their advantages and disadvantages. We will also see how the above-mentioned libraries inject their code at different stages of the compilation and linking process and the role that header files play for libraries. By the end of this article, you should have a good high-level grasp of libraries/dependencies in C++.

In this series

In this article

How we got here, and why you should care about libraries/dependencies in C++

Libraries are a wonderful thing. They help us to achieve common tasks quickly, often with just a few lines of code. They take a lot of heavy lifting off our shoulders and are written by C++ experts, who know how to deal with memory efficiently, and they know how to write clean code to reduce bugs. So why would we not want to exploit their goodwill and use their code which is provided absolutely for free? There are no good reasons, and we should always be using someone else's library before we start to write any code on our own.

Unlike other languages, like Python, for example, dealing with dependencies (which is just another word for saying libraries) is not as straightforward. C++ is a compiled language as we know by now (and you can refresh your memory with my articles on choosing the right programming language and why you should use C++). Its compiled nature means that we first have to compile libraries, which can be a challenging task for new programmers. Once you are hit with an error message, it almost always provides no useful information to identify the root cause, and debugging compilation errors becomes a pain.

Thus, it is easy for us to give libraries a try, but then quickly abandon them as we can't seem to understand how to use them. Even if we do get them compiled, chances are you will look at documentation which is not helpful at all. No developer likes to write documentation, and chances are the documentation you are looking at was put together in haste, or was written by someone who does not care. Either way, all of these factors should not deter us from appreciating the usefulness of libraries once we work through all of the above issues.

What to expect in this series

Therefore, in this series, I want to take a deeper look at libraries; not just how to compile and link them to our code, but understanding libraries from the ground up. This will provide you with the tools to combat compilation errors and get to a point where you can reliably build and use third-party libraries. You will see, that this will make your code that much cleaner, easier to maintain, and probably also faster. And don't worry, if you feel this is all overwhelming, we'll also look at tools to help us with the management of our dependencies. However, if you understand the build process of libraries, you'll be able to troubleshoot any issues that may arise.

In this article, we will explore the different types of libraries that exist in C++. We look at the three types we need to be aware of, how to identify them, and when to use them. See this as the fundamental building block from which all other articles in this series derive. I hope you will make an effort to learn dependency management in C++, your future self will thank you!

Definition of a library

A library is simply a collection of source files, typically plain functions, that can be executed from someone else's code. You can also write your library using classes, but that limits you to C++, and C developers, for example, would not be able to use your library. The key point of a library is that it does not contain a main function. This function has a special meaning in C++, as it signals to the compiler that the execution of the code should start here. Therefore, a library can't contain an additional main function. Otherwise, your code and the library would both compete for the compiler's attention to be used as a starting point.

As a result, we have several ways to overcome this dilemma and the different types of libraries overcome this problem in different ways. Specifically, we have static, dynamic, and header-only libraries that all achieve the same outcome (using additional code in our application) without having additional main functions to deal with. If we compile a library, we can typically choose if we want to have a static or dynamic version of it. If it is a header-only library, we can't compile it and have to use the library as is. Let's look at them in turn.

Static libraries

Whenever you deal with libraries, you include the library code in your application. This means that when you compile the code, your compiler needs to know where that library code is. A static library gets around this by serving all its source files as an archive, similar to a zip file (but instead a format your compiler understands). This archive is then included during the compilation with your application. It is a static library, because the code now sits next to your code in the executable, and so when you run your application, the static library will always be there.

The side effects of static libraries are that we do not have to search for libraries while we run the application, reducing annoying runtime issues with libraries. However, this also means that if you include the library, your executable file size will be that of your code plus that of the library. Think about a situation where you want to perform a simple task and perhaps write an application with 100 lines of code. Then, you want to include the Boost library, which as of the time of writing features just over 6 million lines of code. Then your executable would be the size of 6 million lines of code plus your 100 lines.

Static libraries are easy to deal with and for smaller applications perhaps preferred. Once a library grows in size, you may want to exclude it from your compilation process to keep your executable file size small. Chances are you want to include several libraries, at which point you would end up with ridiculous file sizes. At this point, it may be better to look at dynamic libraries instead!

Dynamic libraries

The main difference between dynamic and static libraries is that dynamic libraries are not part of your compilation process. Instead, you access the code of a dynamic library at runtime, i.e. while your code is executed. This creates an issue, as the address to a function is typically known by your compiler during compilation. Let's look at a quick code example:

Using a function pointer on line 9 (which we introduced in our article on Lambda expressions), we can print the address of the function on line 12. When we call the aptly named function doNothing() on line 14, the compiler will know exactly at what memory address this function can be found and the instruction will jump there.

With dynamic libraries, as they get loaded into our application at runtime, not during compilation, we can't know the entry point into a function. The way to make dynamic libraries work is to provide position-independent code (PIC). Then, compilers will use relative addressing or position-indented instructions to call functions within a dynamic library.

The advantage of a dynamic library is that the file size of your executable remains small. Furthermore, you can update libraries without recompiling your code, as long as the interface of the library remains the same (i.e. the function names and arguments to the functions haven't changed). Their main disadvantage is that loading libraries at runtime is challenging, and not universal across different operating systems, putting a burden on the library developer. However, we have tools to help us with this and we will look at them in later articles.

Header-only libraries

Header-only libraries live in header files, exclusively. Typically, you only provide function or class definitions in header files, and then provide the implementation in source files. However, there are complications, for example when using templates. Code for templates can only be provided in header files unless we restrict the template types that are allowed. In that case, we can provide implementation again in source files.

There are additional advantages; if we write our entire library in header files, without any source files, we can include the library without having to compile it. The compiler will only include the part of the library, through its headers, which are required, and thus we partially compile the library with our own code. If we do not have to compile the code, then we do not have any runtime issues to deal with. Furthermore, if we only compile the part of the library we need, we keep our executable file size to a reasonable limit.

You can see header-only libraries as a compromise between static and dynamic libraries. They are very appealing for C++ programmers, as they can be very easily included in your code. For some libraries, it is as simple as downloading a single header file and sticking it into your code, that's all! No compiler errors to debug or troubleshoot why your library can't be found at runtime, despite knowing that the library exists. The downside is that you have to compile the parts of the library every time you compile your code. With static and dynamic libraries you only compile them once.

Header include files

To bring libraries into your code, you need to include a header file that contains all required definitions for the functions/classes, so the compiler is aware of them at compile time. We have seen this already in previous examples, whenever we used a #include statement, this typically indicated including some library (either a C++ or third-party one). In this section, I want to provide you with an example to understand the include process in C++ and then look at what libraries do to conform to this process.

The compiler scope: what your compile does and doesn't know

The functionality of this code is not very important, but in essence, we are just providing a function to initialise some fields, for example, the velocity field in the x-direction as shown here. The importance here is the order in which we write the function. Note that we have thus far always written the main function last. What happens if we turn this around? I.e. what happens if we write:

Using the GCC compiler, you will get the following error message upon compilation:

We get an error about the initialise() function not being defined (in this scope). But we can very clearly see it. So why is that? When your compiler goes through your code, it will add function definitions to its scope, i.e. it is aware of them. Remember that we said above that a function will be placed somewhere in memory, and when we call a function, the compiler will point to that location in memory so that the function can be executed. In this case, once we hit line 9, the compiler doesn't yet know about the function definition, so it will throw an error and say I don't know where you want me to go in memory.

Therefore, to overcome this issue, we have to provide a hint for the compiler, and we do that by providing the function definition first, and then the implementation alter. This is shown in the following example:

So now we provide the function definition on line 4, and then when we hit line 11 during compilation, the compiler will still not know the exact function but it will know the interface (line 4). It is aware of the function and will continue to compile, and later, once we hit lines 16-18, it will add the entire function to its scope.

The library's responsibility: provide definitions during compilation

We saw in the previous example that we can provide function definitions (interfaces) for the compiler, and this will not interrupt the compilation process. In fact, your compiler doesn't care about the function at all, as long as it knows the interfaces. Try the following: remove or comment out lines 16-18, and see what your compiler is telling you. On GCC, you get the following message:

There is one hint here for us, on line 3, we see that it is the ld process which returns an error. ld is your linker, not your compiler, but the compiler is typically nice enough to call the linker on our behalf. During compilation, the compiler only needs to be aware of function definitions, and it is the linker's job to then resolve these function definitions and to provide a link to where the actual implementation of these functions can be found. This is illustrated in the following diagram:

You can instruct the compiler to only generate the object files (*.o). Using GCC, you have to provide the -c compiler flag, to indicate that you only want to compile your source files, but not link them. If you do that, all error messages go away.

So during compilation, we only need the function definitions, and this is why we have to include a header file for each library we want to use. This header file will contain all function definitions for a library and any additional information like enums or other variables. If you are using a header-only library, you include all functions (with their full implementation) in your source files. Your compiler and linker will know about everything they need. Static libraries, on the other hand, look for full function definitions during the linking stage. And finally, dynamic libraries are included at runtime, no lookup is performed during compilation or linking.

To summarise, the header files you include in your code from your library, merely provide the function definitions at a minimum, but not necessarily the full code. The full functions are then looked up at different stages of the compilation and linking process, depending on the type of our library, i.e. header-only, static, or dynamic.

#include "file.h" vs. #include <file.h>

The final part I want to touch upon is the different #include statements that we can have. There are two options, both widely used and with different default behaviours. If you want to include a header file called library.h, you can have one of the following two forms:

Using the <> brackets, you instruct your compiler to look into your platform-specific include directories. Each compiler will have a set of default #include directories and these are typically where all C++ libraries, like the standard template library (STL), have their header files located.

Using the quotation marks "" instead indicates to the compiler that you want to look for the specified file in the current directory. You can provide relative or absolute paths here. Say for example, that you are storing all of your third-party libraries within your code's project folder in a separate directory called libs, then you would have an #include statement like

assuming that library.h is located in a sub-folder called library. You can also add additional directories to your compiler's default include paths through the -I flag. If you do that, you can use either <> brackets or quotation marks "" and you will see libraries using either one of these notations. For example, if you compile your code with GCC and include -I./libs/library/ as a compiler argument (notice the absence of a space between -I and ./libs/library/), then you could also write

I consider it good practice, though, to leave the <> brackets notation for the standard template library (and any other C++ libraries provided by the compiler), while I like to use quotation marks "" for any third-party libraries.

Summary

Libraries are powerful and give us functionality, performance, and optimised code for free. Most large scientific libraries have been developed by C++ and domain experts with serious funding behind them to maintain and extend these libraries. Why would you not want to take advantage of these libraries? Do not reinvent the wheel, don't be a WET programmer, write DRY code. Getting to grips with libraries can be a challenge initially, but by the end of this series, you should be comfortable working with them and, hopefully, using them.

Tom-Robin Teschner is a senior lecturer in computational fluid dynamics and course director for the MSc in computational fluid dynamics and the MSc in aerospace computational engineering at Cranfield University.

Understanding static, dynamic, and header-only C++ libraries