1. C Overview

1. The C language #

C++, the programming language of this course, is often understood as an extension of the more elementary C programming language. Even though this is not completely true, there are many concepts and syntax choices in C++ that are inherited from C. Over the course of time “modern” alternatives have been built into the C++ language. As a consequence, it is often possible to write C++ code in a “modern” way and in a more classical, C-style, way. In these situations it is good to understand a bit of C. In this chapter we go over some C features that that have been integrated almost directly into C++.

1.1 A minimal C program #

#include <stdio.h>

int main() {
    printf("Hello World!\n");
    return 0; // everything after // or inbetween /* and */ is ignored
}

The main() function is the entry point of a C/C++ program. The program terminates when the return 0; line is reached in the body of main(). #include <...> allows external code to be included in the program, in this case stdio.h defines printf().

1.2 Types & numbers #

To store or pass a value in a C program, you need to declare the type of the value. These are common types that are built into the language:

  • integers ($\mathbb{Z}$), such as int and long,
  • unsigned integers ($\mathbb{N}_0$) such as unsigned int, unsigned long are similar but cannot store a negative number;
  • floating point numbers ($\mathbb{R}$), such as float, double;
  • void is an incomplete type (has no size), and is used i.a. for a function that does not return a value.

Good to know:

  • The long and double utilize more memory than int and float and can store large values. There exist also longer and shorter types.
  • Calculations of variables of different types, will often convert silently. This is called implicit conversion or implicit casting. For example:
        int a = 3;
        // First integer division (rounds down), then implicit casting
        double b = a / 10; // 0.0 (!)
        // 0.3, first explicit casting, then division 
        double c = (double) a / 10; // 0.3, as expected
    
  • The amount of bits that a type uses depends on the operating system and hardware. From the C99 onwards, the standard includes so-called fixed-with integer types, which guarantee a length.

1.2.1 How numbers are stored in memory #

  • On most architectures and operating systems, an int is stored using 32 bits. Even with 64-bit processors, for reasons of backwards compatiblity.
  • int is stored in a binary format called two’s complement. Examples:
    00000000 00000000 00000000 00000001 (1)
    00000000 00000000 00000000 00000101 (5)
    11111111 11111111 11111111 11111111 (-1)
    11111111 11111111 11111111 11111110 (-2)
    11111111 11111111 11111111 11111011 (-5)
    
    The first (leftmost) bit is the sign bit, and indicates if the number is positive or negative. The remaining bits encode the positive value. While positive numbers are stored “the way you write them down”, negative numbers are stored with the value bits flipped. This allows the processor to use arithmetical logic on numbers, regardless of sign. To see this, try adding -2 and 1 above, it will result in -1.
  • Types are merely a label for the data in memory. This gives more freedom, but using them in a wrong way leads to unexpected results.
    // x is of the type `unsigned int`
    unsigned int x = 5; // ...000000101 in memory
    // subtracting 10 regardless of the fact that x is unsigned
    x = x - 10; // results in a two's-complement number ...11111011 in memory
    printf("%u\n", x); // which is binary for 4294967291!
    
  • Floating points (real numbers) are represented according to the IEEE 754 standard. For example a 32-bit float with value 0.15625 is stored as a number $a\cdot 2^b$:
    00111110 00100000 00000000 00000000 (0.15625)
    ^                                   the sign of a
     ^^^^^^^ ^                          b
              ^^^^^^^ ^^^^^^^^ ^^^^^^^^ a
    
    And similarly so for 64-bit numbers (usually, type double).

1.3 Functions, control structures, loops #

Like most other languages, C knows functions. Here is an example program.

// function definition.
// defines a function with the name "sum" and with the body "{ return x+y; }"
int sum(int x, int y) {
    return x + y;
}

// function that doesn't return anything uses `void`
void say_hi() {
    printf("hi!\n");  // `\n` is a newline character
}

int main() {
    say_hi(); // function call, prints "hi!"
    int z = sum(4, 2);
    printf("z = %d\n", z); // prints "z = 6".
    return 0;
}

/* source modified from https://en.cppreference.com/w/c/language/functions. */

Note sum is the name (identifier) of the function. x and y are parameters of type int.

Moreover C has several loops (iteration statements) and control structures (selection statements):

  • Loops: for, while, and dowhile.
  • Selection: ifelse and switch statements.

1.4 Declarations, initializations, definitions, expressions, and statements #

  • A declaration is saying a variable or function (or, an identifier) exists, and what it is (by giving it a type, for instance). No memory is used though, and the identifier is not ready to be used.

    // variable declaration, the identifier is `a`
    int a;
    
    // a function declaration, identifier `max`. Can't be used yet!
    int max(int, int);
    
  • An initialization is when an identifier gets an initial value. This is possible through an “initializer”. Or it can be done implicitly, through assignment.

    int a = 3; // this is not assignment: this is a declaration with initializer!
    
    int b; // declaration
    b = 3; // assignment, implicit initialization
    

    In C++ we will learn that the two approaches can have distinctly different results.

  • A definition is a declaration, when everything is provided so that the identifier can be used. The compiler then allocates storage for it.

    int a = 3; // a declaration with initializer is a definition
    
    // a function with a body is a definition
    double sum(double a, double b) { return a + b; }
    
  • An expression is a computation on variables, some examples:

    • assignment, like a = b;
    • arithmetic operations like addition, a + b, subtraction, a - b, …;
    • relational operations, a > b, a <= b, or a == b, …;
    • logical operators, like && (and), || (or);
    • bitwise operators, like & (bitwise AND), |, (bitwise OR), >> (bitshift right), and << (bitshift left);
    • function calls, variable conversions, among many other things, are also expressions.

    An expression may or may not return a result.

  • Each C program is written as a sequence of statements. Statements are classified in categories, such as expression statements, block statements, selection statements, etc. Expression statements end with ;. Example:

    /* Modified https://en.cppreference.com/w/c/language/statements */
    int main()
    { // start of a compound statement, also called a "block"
        test();  // function call is an operator, so expression statement 
        return 0; // return statement
    } // end of compound statement
    
    void test()
    { // start of compound statement
        // declaration with initializer 
        int n = 1; // not an assignment, not a statement!
        n = n+1; // expression statement
    
        if (n > 1) // `n > 1` is an expression
            // inside the `if` is a selection statement:
            printf("n is greater than 1.\n"); 
    
        printf("n = %d\n", n); // expression statement
    } // end of compound statement
    

    It is a useful skill to be able to mentally disassamble a program and tell expressions, statements, declarations, definitions and the like apart.

1.5 Arrays #

An array contains a number of elements that are all of the same type. Arrays must be initialized from a brace-enclosed list.

int numbers[3]; // declared `numbers` to be an array for 3 ints

int temperatures[3] = {9,-3,0}; // `temperatures` holds 9,-3,0
temperatures[2] = 5; // `temperatures` now holds 9,-3,5
// note: third element is in position 2, counting starts at 0!

Good to know:

  • Once an array has been declared, its size cannot be changed. The reason is that the compiler needs to know the size of the array when the program is compiled.
  • In C++, these arrays are often called “C-style arrays”. They are often replaced by types that provide more flexibility or functionality.

1.6 Pointers #

Something that requires some getting used to are pointers. Pointers are types, like an int or float. But rather than storing an integer or float value, it stores a memory address, pointing to an integer or float. A pointer type is denoted by an asterisk after the type of the value that it points to, e.g. float *, or unsigned int *. Pointers are a useful mechanism for passing data in bigger applications, and they form the basis for later C++ concepts.

To get a memory address (to be stored in a pointer), one option is the address-of operator & (a.k.a. the reference operator).

int a = 5; // declare and initialize `a`
int * ptr; // declare a pointer named `ptr`

// using the reference operator:
ptr = &a; // `ptr`now contains the memory address of `a`!
printf("Address: %x\n", ptr); // prints "Address: cf2a403f" in hexadecimal

Of course, pointers are not very useful if one couldn’t retrieve the value that is stored behind it. To go from a memory address to a value, there is the dereference operator *.

// assume `ptr` contains the memory address of `a`
int b = *ptr; // `b` now contains 5

That asterisks can be used for pointers, for multiplication and for dereferencing, may be a bit puzzling, but they can’t be confused technically.

1.6.1 Using pointers with functions #

When a function is called, all the parameters to the function are copied. Therefore:

  • modification of the parameter passed into the function has no effect on the (possible) variable outside;
  • copying large types is resource intensive.

For these reasons it is often preferable to pass a pointer. Then the pointer itself is copied, but the data that it points to is not. The next example does not copy the some_numbers array (arrays are pointers) into the function multiply_by_two.

void multiply_by_two(int * data, int size) {  // takes a pointer
    for (int i = 0; i < size; i++) // i++ increments the value of i by 1
        data[i] *= 2;  // shorthand for data[i] = data[i] * 2;
}

int main() {
    int some_numbers[5] = {3,7,9,11,13};
    multiply_by_two(some_numbers, 5); // passing a pointer to `some_numbers`.
    printf("%d", some_numbers[2]); // 18!
    return 0;
}

Note that multiply_by_two does not return anything, it changes the the data array directly.

1.7 Structs and objects #

When we get to C++, we will see that object-oriented programming is all about inventing new types. This is, to a limited extent, also possible in C. Examples of these are structs (and unions, not treated here).

A structure (struct) is a type that is composed of other types, a so-called compound type. To declare one, use struct <Your-type-name>, followed by a list of members. Let’s look at an example.

// a declaration of a new struct, named `Person`
// again: we declare a new *type*, not a new variable!
struct Person {
    int age;
    float height;
}

// a declaration with initializer of identifier `anna`
struct Person anna = {32, 1.75};
anna.height = 1.76; // set anna's height to 1cm heigher
printf("Anna is %d year, and %f\n meter.", anna.age, anna.height);

// another initialization of the new type
struct Person tom = {24, 1.51};

The newly created type was a struct Person. Then the type was initialized twice, for two different people. Each initialization of struct Person led to an object. An object is what is returned after the initialization, in this case of a compound type, and is here essentially “a bag of values” corresponding to the members of the struct. The scalars age and height that are stored in the objects tom or anna can be obtained via the member access operator ., c.f. the example above, anna.height or tom.age.

Good to know:

  • If working with a pointer to an object, you’d have to first dereference the pointer and then access the members of the object, e.g. (*ptr).height. The arrow operator -> is a shorthand for this: ptr->height is equivalent.