How C++ Compiler Works

 

This blog is going to introduce how C++ Compiler works. In fact, the C++ source files are just some text files, and we need some way to transform them into some executable binary files. We have two operations here, one of them called compiling and one of them called linking (we will talk about it in next blog). Actually, what the compiler does is taking our source files and convert them into an object files (which is a kind of intermediate format).

Let’s go ahead and see what compiler does!

In the above blog, we have the following two cpp files, Log.cpp file and

1
2
3
4
5
6
7
8
9
10
// Log.cpp

#include <iostream>

void Log(const char* message)


{
	std::cout << message << std::endl;
}

main.cpp file.

1
2
3
4
5
6
7
8
9
10
11
12
13
//main.cpp

#include <iostream>

void Log(const char* message);
// OR ignore the variable, like this:
// void Log(const char*); 

int main()
{
	Log("Hello World!");
	std::cin.get();
}

After we debug our program, will find there are two obj files, Log.obj and main.obj, in our debug folder. So, what compiler have done is to generate obj file for each C++ file.

And these cpp files are called transition units. In fact, C++ doesn’t care about files, which are just a way to feed source code for compiler. We just need to tell the compiler, “Hey! This is a cpp file” by adding the extension “.cpp”. If we create a file with extension as “.h”, compiler will treat the “.h” file as a head file. These are default convention, if you want to compile a file like “XX.random” as a Cpp file, you just need to tell the compiler, please compile it as a cpp file. To sum up, files are just files, which only provide source code.

Each translation unit will generate an obj file. If we creat a cpp file which includes other cpp files, actually compiler will treat it as one transilation unit and creat only one obj file.

Here we give another more simple exmaple. We creat a new cpp file called “Math.cpp” shown in below

1
2
3
4
5
6
//Math.cpp
int Multiply(int a, int b) 
{
	int result = a * b;
	return result;
}

After we use “Ctrl+F7” to build up this file, we can get Math.obj file in our folder. Before we look what exactly is in obj file, let's first talk about the first stage of compilation "Pre-processing".

Pre-processing

During the pre-processing stage, the compiler will basically go through all of our pre-processing statements. The common statements we use are include, define, if, if def and pragma.

#include

First, we take a look at one of the most common preprocessor statement –   #include  .

  #include   is exactly really simple, the preprocessor will just open the file that we include, read all its contents and paste it into the file where we wrote   #include  .

To prove that, we can take an example.

We add a header file called “EndBrace.h” in our project, and only type a “}” in it.

1
2
//EndBrace.h
}

And, then go back to our Math.cpp file, and replace the closing curly bracket with   #include "EndBrace.h"  .

1
2
3
4
5
6
7
//Math.cpp

int Multiply(int a, int b) 
{
	int result = a * b;
	return result;
#include "EndBrace.h"

We use “Ctrl+F7” to compile it, and it compile successfully. So, what   #include   does is just to copy and paste all the contents in our specified file.

#define

We can try another example, use   #define   to replace “INTGER” with “int”. Actually, what   #define   does it just read all the code in our file and replace all the first “word” with following “word”.

1
2
3
4
5
6
7
8
9
//Math.cpp

#define INTEGER int

INTEGER Multiply(int a, int b)
{
	INTEGER result = a * b;
	return result;
}

Here we can change our property and get the preprocessing file. Like the following figure,

And open the “Math.i” in our folder, we will find all the “INTGER”s are replaced with “int”.

#if

The preprocessor   #If   can let us exclude or include code based on a give condition. If we write   #If 1  , that means the condition is always true.

1
2
3
4
5
6
7
8
9
10
11
//Math.cpp

#if 1

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

#endif

After compilation, we can check our preprocessor file, which looks excatly like the result without   #If 1   statement.

If we turn off here with   #If 0  

1
2
3
4
5
6
7
8
9
10
11
//Math.cpp

#if 0

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

#endif

The preprocessor file “Math.i” is like this, and our code was disabled.

Ok, that’s all about preprocessor, if intersted, you can check the preprocessor file after adding “#include <iostream>”. You will find the preprocessor file will become so large, that’s because, “<iostream>” has a lot of content and also include other files.

Obj file

Now, let’s change back the setting of “property”. And compile our cpp file again, we will get “Math.obj” in the folder. Let’s check what’s inside in our obj file. But unfortunately, it’s a binary file, and we can not understand it directly.

1
2
3
4
5
6
7
//Math.cpp

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

So, let’s convert it to a more readable form. We can also hit “Property” here and set the “Assembler Output” to “Assembly-Only Listing (/FA)”

And, then you will find “Math.asm” file in our “Debug” folder. Which is basically a readable result if what the object file contains. Let’s check the critical part of this file.

[-> 1]. Move variable a to “eax”

[-> 2]. Let “eax” multiply variable b.

[-> 3]. Then move “eax” to variable result

[-> 4]. Move result back to “eax” to return it.

We find Step 3 and 4 actully are redundant. This is also a example why we need optimization during compilation. If we change our code as

1
2
3
4
5
6
//Math.cpp

int Multiply(int a, int b)
{
	return a * b;
}

After we compile it, “Math.asm” will also change as below, which only need two steps.

Optimization

We will find, there are a lot of code in our asm file. That’s because we don’t use optimization in debug mode. Here we can temporarily change our “property” as follow to use optimization during compilation.

First, we set “Maximum Optimization (Favor Speed) (/O2)” in “Optimization” under “Debug” configuration shown as below

If we just compile it now, we will have error, so we have to change another place. We need to set “Default” in “Basic Runtime Checks” at “Code Generation” item.

Then if we compile it, we will find the “Math.asm” file is quite simple than before.

Here we will take another different example, here the funtion don’t take any input and just return 5*2.

1
2
3
4
5
6
//Math.cpp

int Multiply()
{
	return 5 * 2;
}

Let’s compile it and check the “Math.asm” file. It just moves 10 to our “eax” register, which store our return value. So, the opimization just simplified our “5*2” as “10”.

Let’s take another example, we add “Log” funtion here.

1
2
3
4
5
6
7
8
9
10
11
12
//Math.cpp

const char* Log(const char* message) 
{
	return message;
}

int Multiply(int a, int b)
{
	Log("Multiply");
	return a * b;
}

After compilation, we find the   Log   function is just move “message” to eax register.

And move to   Multiply   funtion, we find what it do, is just to call   Log   function before multiplication.

Here, we notice the “Log” function in asm file is decorated with a string of complex charcters and signs. That is actually the function signature, which is used to uniquely define our funtion, we will talk more about it in the following blog about “Linking”.