3. Basic Types and Operators

C provides a standard, though minimal, set of basic data types. Sometimes these are called "primitive" types. This chapter focuses on defining and showing examples of using the various types, including operators and expressions that can be used. More complex data structures can be built up from the basic types described below, as we will see in later chapters.

3.1. Integer types

There are several integer types in C, differing primarily in their bit widths and thus the range of values they can accommodate. Each integer type can also be signed or unsigned. Signed integer types have a range \(-2^{width-1}..2^{width-1}-1\), and unsigned integers have a range \(0..2^{width}-1\). There are five basic integer types:

char: One ASCII character

The size of a char is almost always 8 bits, or 1 byte. 8 bits provides a signed range of -128..127 or an unsigned range is 0..255, which is enough to hold a single ASCII character [1]. char is also required to be the "smallest addressable unit" for the machine --- each byte in memory has its own address.

short: A "small" integer

A short is typically 16 bits, which provides a signed range of -32768..32767. It is less common to use a short than a char, int, or something larger.

int: A "default" integer size

An int is typically 32 bits (4 bytes), though it is only guaranteed to be at least 16 bits. In typical microcontroller environments, an int is almost always 16 bits! It is defined to be the "most comfortable" size for the computer architecture for which the compiler is targetted. If you do not really care about the range for an integer variable, declare it int since that is likely to be an appropriate size which works well for that machine.

long: A large integer

A least 32 bits. On a 32-bit machine, it will usually be 32 bits, but on a 64 bit machine, it will usually be 64 bits.

long long

Modern C compilers also support long long as an integer type, which is a 64-bit integer.

The integer types can be preceded by the qualifier unsigned which disallows representing negative numbers and doubles the largest positive number representable. For example, a 16 bit implementation of short can store numbers in the range -32768..32767, while unsigned short can store 0..65535.

Although it may be tempting to use unsigned integer types in various situations, you should generally just use signed integers unless you really need an unsigned type. Why? The main reason is that it is common to write comparisons like x < 0, but if x is unsigned, this expression can never be true! A good compiler will warn you in such a situation, but it's best to avoid it to begin with. So, unless you really need an unsigned type (e.g., for creating a bitfield), just use a signed type.

3.1.1. The sizeof keyword

There is a keyword in C called sizeof that works like a function and returns the number of bytes occupied by a type or variable. If there is ever a need to know the size of something, just use sizeof. Here is an example of how sizeof can be used to print out the sizes of the various integer types on any computer system. Note that the %lu format placeholder in each of the format strings to printf means "unsigned long integer", which is what sizeof returns. (As an exercise, change %lu to %d and recompile with clang. It will helpfully tell you that something is fishy with the printf call.)

 1#include <stdio.h>
 2#include <stdlib.h>
 3
 4int main() {
 5    char c = 'a';
 6    short s = 0xbeef;
 7    int i = 100000;
 8    long l = 100000000L;
 9    long long ll = 60000000000LL;
10    printf("A char is %lu bytes\n", sizeof(c));
11    printf("A short is %lu bytes\n", sizeof(s));
12    printf("An int is %lu bytes\n", sizeof(i));
13    printf("A long is %lu bytes\n", sizeof(l));
14    printf("A long long is %lu bytes\n", sizeof(ll));
15    return EXIT_SUCCESS;
16}

When the above program is run on a 32-bit machine [2], the output is:

A char is 1 bytes
A short is 2 bytes
An int is 4 bytes
A long is 4 bytes
A long long is 8 bytes

and when the program is run on a 64-bit machine, the output is:

A char is 1 bytes
A short is 2 bytes
An int is 4 bytes
A long is 8 bytes
A long long is 8 bytes

Notice that the key difference above is that on a 64-bit platform, the long type is 8 bytes (64 bits), but only 4 bytes (32 bits) on a 32-bit platform.

3.1.2. char literals

A char literal is written with single quotes (') like 'A' or 'z'. The char constant 'A' is really just a synonym for the ordinary integer value 65, which is the ASCII value for uppercase 'A'. There are also special case char constants for certain characters, such as '\t' for tab, and '\n' for newline.

'A'

Uppercase 'A' character

'\n'

Newline character

'\t'

Tab character

'\0'

The "null" character --- integer value 0 (totally different from the char digit '0'!). Remember that this is the special character used to terminal strings in C.

'\012'

The character with value 12 in octal, which is decimal 10 (and corresponds to the newline character). Octal representations of chars and integers shows up here and there, but is not especially common any more.

0x20

The character with hexadecimal value 20, which is 32 in decimal (and corresponds to the space ' ' character). Hexadecimal representations of chars and integers is fairly common in operating systems code.

3.1.3. int literals

Numbers in the source code such as 234 default to type int. They may be followed by an 'L' (upper or lower case) to designate that the constant should be a long, such as 42L. Similarly, an integer literal may be followed by 'LL' to indicate that it is of type long long. Adding a 'U' before 'L' or 'LL' can be used to specify that the value is unsigned, e.g., 42ULL is an unsigned long long type.

An integer constant can be written with a leading 0b to indicate that it is expressed in binary (base 2). For example 0b00010000 is the way to express the decimal number 16 in binary. Similarly, a leading 0x is used to indicate that a value is expressed in hexadecimal (base 16) --- 0x10 is way of expressing the decimal number 16. Lastly, a constant may be written in octal (base 8) by preceding it with 0 (single zero) --- 012 is a way of expressing the decimal number 10. A pitfall related to octal notation is that if you accidentally write a decimal value with a leading 0, the C compiler will interpret it as a base-8 value!

3.1.3.1. Type combination and promotion

The integral types may be mixed together in arithmetic expressions since they are all basically just integers. That includes the char type (unlike Java, in which the byte type would need to be used to specify a single-byte integer). For example, char and int can be combined in arithmetic expressions such as ('b' + 5). How does the compiler deal with the different widths present in such an expression? In such a case, the compiler "promotes" the smaller type (char) to be the same size as the larger type (int) before combining the values. Promotions are determined at compile time based purely on the types of the values in the expressions. Promotions do not lose information --- they always convert from one type to a compatible, larger type to avoid losing information. However, an assignment (or explicit cast) from a larger type to smaller type (e.g., assigning an int value to a short variable) may indeed lose information.

3.2. Floating point types

float

Single precision floating point number typical size: 32 bits (4 bytes)

double

Double precision floating point number typical size: 64 bits (8 bytes)

long double

A "quad-precision" floating point number. 128 bits on modern Linux and MacOS X machines (16 bytes). Possibly even bigger floating point number (somewhat obscure)

Constants in the source code such as 3.14 default to type double unless they are suffixed with an 'f' (float) or 'l' (long double). Single precision equates to about 6 digits of precision and double is about 15 digits of precision. Most C programs use double for their computations, since the additional precision is usually well worth the additional 4 bytes of memory usage. The only reason to use float is to save on memory consumption, but in normal user programs the tradeoff just isn't worth it.

The main thing to remember about floating point computations is that they are inexact. For example, what is the value of the following double expression?

(1.0/3.0 + 1.0/3.0 + 1.0/3.0) // is this equal to 1.0 exactly?

The sum may or may not be 1.0 exactly, and it may vary from one type of machine to another. For this reason, you should never compare floating numbers to each other for equality (==) --- use inequality (<) comparisons instead. Realize that a correct C program run on different computers may produce slightly different outputs in the rightmost digits of its floating point computations.

3.3. Boolean type

In C prior to the C99 standard, there was no distinct Boolean type. Instead, integer values were used to indicate true or false: zero (0) means false, and anything non-zero means true. So, the following code:

int i = 0;
while (i - 10) {

    // ...

}

will execute until the variable i takes on the value 10 at which time the expression (i - 10) will become false (i.e., 0).

In the C99 revision, a bool type was added to the language, but the vast majority of existing C code uses integers as quasi-Boolean values. In C99, you must add #include <stdbool.h> to your code to gain access to the bool type. Using the C99 bool type, we could modify the above code to use a Boolean flag variable as follows:

#include <stdbool.h>

// ...

int i = 0;
bool done = false;
while (!done) {

    // ...

    done = i - 10 == 0
}

3.4. Basic syntactic elements

3.4.1. Comments

Comments in C are enclosed by slash/star pairs:

/* .. comments .. */
which may cross multiple lines. C++ introduced a form of comment started by two slashes and extending to the end of the line::

// comment until the line end

The // comment form is so handy that many C compilers now also support it, although it is not technically part of the C language.

Along with well-chosen function names, comments are an important part of well written code. Comments should not just repeat what the code says. Comments should describe what the code accomplishes which is much more interesting than a translation of what each statement does. Comments should also narrate what is tricky or non-obvious about a section of code.

3.4.2. Variables

As in most languages, a variable declaration reserves and names an area in memory at run time to hold a value of particular type. Syntactically, C puts the type first followed by the name of the variable. The following declares an int variable named "num" and the 2nd line stores the value 42 into num:

int num = 42;
Memory box diagram

A simple memory diagram for int num = 42;.

A variable corresponds to an area of memory which can store a value of the given type. Making a drawing is an excellent way to think about the variables in a program. Draw each variable as box with the current value inside the box. This may seem like a "newbie" technique, but when you are buried in some horribly complex programming problem, it will almost certainly help to draw things out as a way to think through the problem. Embrace your inner noob.

Names in C are case sensitive so "x" and "X" refer to different variables. Names can contain digits and underscores (_), but may not begin with a digit. Multiple variables can be declared after the type by separating them with commas. C is a classical "compile time" language --- the names of the variables, their types, and their implementations are all flushed out by the compiler at compile time (as opposed to figuring such details out at run time like an interpreter).

3.4.3. Assignment Operator =

The assignment operator is the single equals sign (=):

i = 6;
i = i + 1;

The assignment operator copies the value from its right hand side to the variable on its left hand side. The assignment also acts as an expression which returns the newly assigned value. Some programmers will use that feature to write things like the following:

y = x = 2 * x;  // double x, and also put x's new value in y

3.4.3.1. Demotion on assignment

The opposite of promotion, demotion moves a value from a type to a smaller type. This is a situation to be avoided, because strictly speaking the result is implementation and compiler-defined. In other words, there's no guarantee what will happen, and it may be different depending on the compiler used. A common behavior is for any extra bits to be truncated, but you should not depend on that. At least a good compiler (like clang) will generate a compile time warning in this type of situation.

The assignment of a floating point type to an integer type will truncate the fractional part of the number. The following code will set i to the value 3. This happens when assigning a floating point number to an integer or passing a floating point number to a function which takes an integer. If the integer portion of a floating point number is too big to be represented in the integer being assigned to, the result is the ghastly undefined (see [4]). Most modern compilers will warn about implicit conversions like in the code below, but not all.

int i;
i = 3.14159; // truncation of a float value to int

3.4.4. Arithmetic operations

C includes the usual binary and unary arithmetic operators. It is good practice to use parentheses if there is ever any question or ambiguity surrounding order of operations. The compiler will optimize the expression anyway, so as a programmer you should always strive for maximum readability rather than some perceived notion of what is efficient or not. The operators are sensitive to the type of the operands. So division (/) with two integer arguments will do integer division. If either argument is a float, it does floating point division. So (6/4) evaluates to 1 while (6/4.0) evaluates to 1.5 --- the 6 is promoted to 6.0 before the division.

Operator

Meaning

+

Addition

-

Subtraction

/

Division

*

Multiplication

%

Remainder (mod)

3.4.5. Unary Increment Operators: ++ and --

The unary ++ and -- operators increment or decrement the value in a variable. There are "pre" and "post" variants for both operators which do slightly different things (explained below).

Operator

Meaning

var++

increment "post" variant

++var

increment "pre" variant

var--

decrement "post" variant

--var

decrement "pre" variant

An example using post increment/decrement:

int i = 42;
i++;     // increment on i
// i is now 43
i--;     // decrement on i
// i is now 42

3.4.5.1. Pre- and post- variations

The pre-/post- variation has to do with nesting a variable with the increment or decrement operator inside an expression --- should the entire expression represent the value of the variable before or after the change? These operators can be confusing to read in code and are often best avoided, but here is an example:

int i = 42;
int j;
j = (i++ + 10);
// i is now 43
// j is now 52 (NOT 53)
j = (++i + 10)
// i is now 44
// j is now 54

3.4.6. Relational Operators

These operate on integer or floating point values and return a 0 or 1 boolean value.

Operator

Meaning

==

Equal

!=

Not Equal

>

Greater Than

<

Less Than

>=

Greater or Equal

<=

Less or Equal

To see if x equals three, write something like if (x==3) ....

3.4.7. Logical Operators

The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern. However, there are many C programs out there which use values other than 1 for true (non-zero pointers for example), so when programming, do not assume that a true boolean is necessarily 1 exactly.

Operator

Meaning

!

Boolean not (unary)

&&

Boolean and

||

Boolean or

3.4.8. Bitwise Operators

C includes operators to manipulate memory at the bit level. This is useful for writing low-level hardware or operating system code where the ordinary abstractions of numbers, characters, pointers, etc... are insufficient. Using bitwise operators is very common in microcontroller programming environments and in some "systems" software.

Bit manipulation code tends to be less "portable". Code is "portable" if with no programmer intervention it compiles and runs correctly on different types of processors. The bitwise operations are typically used with unsigned types. In particular, the shift operations are guaranteed to shift zeroes into the newly vacated positions when used on unsigned values.

Operator

Meaning

~

Bitwise NOT (unary) – flip 0 to 1 and 1 to 0 throughout

&

Bitwise AND

|

Bitwise OR

^

Bitwise XOR (Exclusive OR)

>>

Right Shift by right hand side (RHS) (divide by power of 2)

<<

Left Shift by RHS (multiply by power of 2)

Do not confuse the bitwise operators with the logical operators. The bitwise connectives are one character wide (&, |) while the boolean connectives are two characters wide (&&, ||). The bitwise operators have higher precedence than the boolean operators. The compiler will not typically help you out with a type error if you use & when you meant &&.

3.4.8.1. Bitwise operation example

Say we want to set certain bits in a byte. In particular, say that (starting at 1, on the far right) we want to set the 2nd and 5th bits so that the byte is equal to 0b00010010 (hex 0x12). The following program would do that:

 1#include <stdio.h>
 2
 3#define SECOND 1
 4#define FIFTH 4
 5
 6int main() {
 7    unsigned char flags = 0b00000000;
 8    flags = (1<<SECOND);          // light up the 2nd bit
 9    flags = flags | (1 << FIFTH); // and the 5th
10    printf("0x%x\n", flags);
11    return 0;
12}

Lines 3 and 4 in the code segment above create two macro substitutions (#define is a C preprocessor directive). During the preprocessing phase of compilation, anywhere SECOND appears will be replaced with the value 1; similarly for the text FIFTH. So why is SECOND defined as 1 and not 2 (and similar for FIFTH)? We will come to that shortly.

In main, the variable flags is assigned all zeroes (notice the binary literal) on line 7, then on line 8 we assign to flags by shifting a 1 SECOND places to the left. Since SECOND is assigned 1, we shift 0b00000001 one place to the left, giving 0b00000010. So perhaps that's an answer to the question above: the value for SECOND and FIFTH define the number of positions to shift a 1 to the left. On line 9, we do the same thing for FIFTH but we also perform a bitwise OR with the existing value of flags. The bitwise OR operation provides a way to combine (or union) two or more values together. For example 0x01 | 0xF0 is 0xF1.

Although not shown in the example above, if we wanted to check whether the fifth bit in a byte (again, starting at 1 counting from the right), we might use the following expression: fifth_is_set = flags & (1<<FIFTH);. Doing a bitwise AND is referred to as masking since AND``ing anything with ``0b00010000 will mask (unset) any bits that may have been set other than the fifth bit. Similarly, if we wanted to unset a particular bit but leave all others unchanged, we could create a mask like this: ~(1<<FIFTH) which has a bit-level representation of 0b11101111. Performing an AND with that mask and any byte would leave all bits except the 5th as-is while setting the 5th to 0.

3.4.9. Other Assignment Operators

In addition to the plain = operator, C includes many shorthand operators which represents variations on the basic =. For example += adds the right hand side to the left hand side. x = x + 10 can be reduced to x += 10. Note that these operators are much like similar operators in other languages, like Python and Java. Here is the list of assignment shorthand operators:

Operator

Meaning

+=, -=

Increment or decrement by RHS

*=, /=

Multiply or divide by RHS

%=

Mod by RHS

>>=

Bitwise right shift by RHS (divide by power of 2)

<<=

Bitwise left shift by RHS (multiply by power of 2)

&=, |=, ^=

Bitwise and, or, xor by RHS

Exercises

The theme for the following exercises is dates and times, which often involve lots of interesting calculations (sometimes using truncating integer arithmetic, sometimes using modular arithmetic, sometimes both), and thus good opportunities to use various types of arithmetic operations, comparisons, and assignments.

  1. Write a C program that asks for a year and prints whether the year is a leap year. See the Wikipedia page on leap year for how to test whether a given year is a leap year. Study the first program in the tutorial chapter for how to collect a value from keyboard input, and use the atoi function to convert a C string (char array) value to an integer.

  2. Write a program that asks for year, month, and day values and compute the corresponding Julian Day value. See the Wikipedia page on Julian Day for an algorithm for doing that. (See specifically the expression titled "Converting Gregorian calendar date to Julian Day Number".)

  3. Extend the previous program to compute the Julian date value (a floating point value), using the computation described in "Finding Julian date given Julian day number and time of day" on the Wikipedia page linked in the previous problem. Note that you'll need to additionally ask for the current hour, minute and second from the keyboard.

  4. Write a program that asks for a year value and computes and prints the month and day of Easter in that year. The Wikipedia page on Computus provides more than one algorithm for doing so. Try using the "Anonymous Gregorian algorithm" or the "Gauss algorithm", which is a personal favorite.

References

[Regehr]
  1. Regehr. A Guide to Undefined Behavior in C and C++, Part 1. https://blog.regehr.org/archives/213

[Lattner]
  1. Lattner. What Every C Programmer Should Know About Undefined Behavior #1/3. http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

Footnotes