7. Functions

All programming languages have built-in mechanisms for structuring and modularizing the code. The main mechanism that C provides is the subroutine, or function. In fact, C provides little beyond this basic technique [1].

Good program design in C (and many other programming languages) involves creating what are typically short functions that accomplish one task, and in which there is little or no duplication of functionality. The main reasons for creating short, single-purpose functions is to make them more easily testable and to make them easier to read. There are many benefits to having one place in the code where each major component is implemented, including making it easier to modify the code and making it easier to test. These ideas are so important in programming that they are present in many different design principles, such as the Abstraction Principle [2], the Don't Repeat Yourself (DRY) principle [3], and structured programming [4].

7.1. Function syntax

A function has a name, a comma-separated list of parameters, the block of code it executes when called, and, optionally, a return value. The basic function definition syntax is:

return-type function-name(parameters) { code-block }

For example, here is a function that computes and returns \(n!\) (n factorial):

 1 /*
 2  * iteratively computes and returns n!
 3  * if n < 0, returns 0.
 4  */
 5 int factorial(int n) {
 6     if (n < 0) {
 7         return 0;
 8     }
 9     int result = 1;
10     for (int i = 2; i <= n; i++) {
11         result *= i;
12     }
13     return result;
14 }

We have seen most of this syntax already (with the main function), but it is worth reviewing.

  1. On line 5, the type declaration of the function is given, and shows that the data type of the return value of the function is int, and that the function takes a single int parameter, named n. The function is, of course, named factorial.

  2. There can be any number of parameters to a function (though a small number of parameters is strongly preferable). Each parameter is separated by a comma, and follows the basic syntax of a variable declaration, which is: type-name variable-name.

  3. Any parameters to the function are passed by value. This means that the function receives copies of the arguments passed to it by the caller. If any modifications are made to those copies, they have zero effect on the variables or values passed to the function --- any changes are confined (local) to the function. We will have more to say about pass-by-value function call semantics, below.

  4. The return statement delivers a value from the function back to the caller of the function. There are return statements on lines 7 and 13 of this function. The first return handles the special case where n is less than 0, thus \(n!\) is undefined.

  5. Any variables declared inside the function are local to the function, just as with Java, Python, and many other programming languages. Thus, the parameter variable n and the local variable result only exist while the function is being executed.

To call the factorial function, a programmer uses parentheses after the function name, passing any required arguments between the parens:

#include <stdio.h>
#include <stdlib.h>

// ...

char buffer[8];
printf("I'll compute n! for you.  What should n be? ");
fgets(buffer, 8, stdin);
n = atoi(buffer);
int result = factorial(n);
printf ("%d! is %d\n", n, result);

So far, none of this should be particularly surprising. You may have already seen "public static methods" in Java (e.g., main!), which are very similar to C functions, or you may have already seen functions in Python (defined using the def keyword). Both public static methods in Java and functions in Python behave very similarly to functions in C. In fact, all of these languages use pass-by-value semantics for parameters.

7.1.1. main is where it all begins

Every C program must have a main function. An attempt to compile a program in C which does not have a main function defined somewhere will result in an error. Unlike Java, where any number of class definitions can have a public static void main definition, it's a highlander situation in C: there can be only one [5].

7.1.2. Function naming restrictions and conventions

The only requirement for naming C functions is similar to many programming languages: function names must either begin with a letter or underscore, and can only include numbers, letters, and underscores. Conventionally, function names that start with an underscore typically mean that the function should be treated as a "private" library function, off limits to other programmers.

As far as naming conventions, there are a wide variety of practices in existence. Some programmers like to name their functions using lowerCamelCase, which is common in languages such as Java, or using snake_case, which is common in Python and Ruby. In the C standard library, a common practice is to use short, abbreviated names consisting of a single word (e.g., tolower). Still others like to use the abomination referred to as Hungarian Notation [6]. A fairly widely used convention in C is snake_case, which is the practice followed in this book.

7.2. Data types for parameters and return values

There are, technically speaking, no restrictions on the data types of parameters or return values for functions in C. Functions in C can accept all the basic data types as parameters, as well as struct types, arrays, and as we will see soon, memory pointers to any data type.

Likewise, there are no syntactic restrictions on the data type of the return value from a function. C does not permit multiple return values, unlike some other languages, but it is permissible to return a struct type that contains multiple fields (or, as we will later see, a pointer to a memory block containing multiple data items).

7.2.1. Parameters to functions are passed by value

The key thing to remember for function parameters is that they are passed by value. (Note that Java also uses pass-by-value semantics for method parameters.) Passing by value means that the actual parameter values are copied into local storage (on the stack). This scheme is fine for many purposes, but it has two disadvantages:

  1. Because the function invocation ("callee") has its own copy (or copies) of parameters, modifications to that memory are always local to the function. Therefore, value parameters do not allow the callee to communicate back to the caller. The function's return value can communicate some information back to the caller, but not all problems can be solved with the single return value.

    "Wait!", you may exclaim. "Can't I pass in a list variable in Python or some object variable in Java and modify that variable within the function or method I call?" The answer is not really. While it's true that you can modify a list that's passed into a function in Python, the parameter variable in the function really just receives a copy of a reference to the list, not the list itself. Same thing for Java: when you pass an object into a method, you're really passing a reference to an object into the method. The method receives a copy of the reference, allowing you to manipulate that object. We can get similar behavior with passing "pointers" into functions in C, which we'll see in the next chapter.

  2. Sometimes it is undesirable to copy the value from the caller to the callee because the value is large and copying is expensive, or because, for some reason, copying the value is undesirable.

7.2.2. Example 1: an anti-example of swapping two values

As mentioned above, a key implication of pass-by-value function parameters is that the function gets local copies of any parameter values. Say that we want to exchange two values via a function. For example, we want to swap the numerator and denominator in a fraction struct. This is some code that would not work:

#include <stdlib.h>

typedef struct fraction {
    int numerator;
    int denominator;
} fraction_t;

void swap_numerator_denominator1(fraction_t frac) {
    int tmp = frac.numerator;
    frac.numerator = frac.denominator;
    frac.denominator = tmp;
}

void swap_numerator_denominator2(int nancy, int donkey) {
    int tmp = nancy;
    nancy = donkey;
    donkey = nancy;
}

int main() {
    fraction_t f1 = { 1, 3};
    swap_numerator_denominator1(f1);  // swap?  uh, no.
    swap_numerator_denominator2(f1.numerator, f1.denominator);  // no, again
    return EXIT_SUCCESS;
}

Epic fail, times 2. For each of the swap functions, the exchange happens only inside the function; the caller will never see any swap of nancy and donkey. The fact that the function takes a struct here is irrelevant for attempt 1; even if we wrote a swap to take two int parameters (i.e., attempt 2), there is no way this is going to happen. Sorry. In the next chapter on Pointers and more arrays, we will solve this problem.

7.2.3. Example 2: passing a struct to and from a function

Ok, enough of the anti-examples. Here is an example of passing and returning a struct. We'll write a function to add two fractions together and return a new fraction. A few things to note about the code below:

  • The computation of the greatest common divisor is recursive.

  • The use of abs in the least common multiple function requires #include <stdlib.h> at the top of the source code.

  • The add_fractions function separately computes the denominator and numerator of the result of the addition, then just constructs and returns a new fraction_t. For both the fraction_t parameters and the return value, entire copies are made to get the arguments "in" to the function and to get the result "out".

// compute the greatest common divisor, recursive style
int gcd(int a, int b) {
    if (b == 0) {
        return a;
    } else {
        return gcd(b, a % b);
    }
}

// compute the least common multiple 
int lcm(int a, int b) {
    return abs(a * b) / gcd(a, b);
}

// add a couple fractions together
fraction_t add_fractions(fraction_t f1, fraction_t f2) {
    int denom = lcm(f1.denominator, f2.denominator);
    int numer = (denom / f1.denominator) * f1.numerator +
                (denom / f2.denominator) * f2.numerator;
    fraction_t result = { numer, denom };
    return result;
}

7.2.4. Example 3: passing an array to a function

Passing an array parameter to a function is somewhat different in nature than the other parameter data types we've seen:

  1. It is often the case that it is not possible to know the correct array length when declaring the formal parameter in the function declaration. This is actually a good thing in disguise: it forces us to write a more general function instead of one that specifies an array of a certain size.

    For example, say we want to write a function to multiply several fractions together, where each fraction is an element of an array. We want to write the function so that it can handle any array size. We write it as shown below. Notice that we leave the array size blank in the formal parameter, and pass a second parameter that specifies the number of array elements we should examine. Since an array in C does not know its own size, we are forced to pass its size separately.

fraction_t multiply_fractions(fraction_t fraclist[], int num_fractions) {
    fraction_t result = { 1, 1};
    for (int i = 0; i < num_fractions; i++) {
        result.numerator *= fraclist[i].numerator;
        result.denominator *= fraclist[i].denominator;
    }
    return result;
}
  1. The second issue that makes array parameters somewhat different than any other data type we've seen thus far is that an array variable refers to the memory address of the first element in the array. As a result, it is possible to modify the contents of an array that is passed to a function. Pass-by-value semantics still apply; the function simply receives a copy of the memory address at which the array begins. Here is an example of a function that modifies a C string by overwriting any trailing whitespace characters with the null-character. Notice that for C strings we do not need to pass the size of the string, since, by convention, the null character marks the end of the string (and thus we can just use the built-in strlen function).

#include <stdio.h>
#include <stdlib.h>

void strip_trailing_whitespace(char string[]) {
    int index = strlen(string)-1;
    while (index >= 0) {
        if (isspace(string[index])) {
            string[index] = '\0';
            index -= 1;
        } else {
            // as soon as we encounter first non-whitespace
            // character, get out of loop
            break;
        }
    }
}

int main() {
    char s[128] = "hello\n\n\t";
    strip_trailing_whitespace(s);
    printf("%s\n", s);
    return EXIT_SUCCESS;
}

How does this jive with pass-by-value? What happens here is that s in main holds the memory address of the array, which is allocated on the stack of main. When the strip_trailing_whitespace function is called, the value of s is copied to the parameter string, but the value itself is a memory address. So the string array inside strip_trailing_whitespace holds the same memory address as s back in main. Thus, these two variables refer to the same array in memory, as depicted in the figure below. As a result, when we modify the string inside the function, the changes can be observed when we return back to main.

_images/arrayparam.png

An array parameter gets a copy of the memory address of the array passed into the function, and thus "points" back to the same array contents as can be observed outside the function.

7.2.5. A longer example

We'll wrap up this section with one more example. A few things to note:

  • The struct student has an embedded struct field (struct course_grade). Actually, an array of struct course_grade. One struct student would occupy a pretty large chunk of memory. It is left to you to compute how many bytes, and where any padding is silently inserted by the compiler.

  • In struct student we need to keep the field num_courses_completed to know how many array elements in courses_completed are meaningful.

  • The typedefs on lines 16-17 help to save a few keystrokes with the struct usage.

  • In compute_gpa, we don't need to specify the size of the grade_t array, but we do need an additional parameter to tell us how many entries in the array we should consider.

  • The initialization syntax for the array of student_t just follows the rules we've discussed for array and struct initialization. It is perfectly valid to nest the curly braces where necessary to achieve the correct field initializations.

 1#include <stdio.h>
 2#include <stdlib.h>
 3
 4struct course_grade {
 5    char course_name[32];
 6    char letter_grade;
 7};
 8
 9struct student {
10    char name[32];
11    short class_year;
12    int num_courses_completed;
13    struct course_grade courses_completed[48];
14};
15
16typedef struct student student_t;
17typedef struct course_grade grade_t;
18
19double compute_gpa(grade_t course_list[], int num_courses) {
20    double sum = 0.0;
21    for (int i = 0; i < num_courses; i++) {
22        if (course_list[i].letter_grade == 'A') {
23            sum += 4.0;
24        } else if (course_list[i].letter_grade == 'B') {
25            sum += 3.0;
26        } else if (course_list[i].letter_grade == 'C') {
27            sum += 2.0;
28        } else if (course_list[i].letter_grade == 'D') {
29            sum += 1.0;
30        }
31    }
32    return sum / num_courses;
33}
34
35void print_student(student_t s, double gpa) {
36    printf("%s, Class of %d, GPA: %2.2f\n", s.name, s.class_year, gpa);
37}
38
39int main() {
40    student_t students[2] = { 
41        { "A. Student", 2019, 3, { {"Flowerpot construction", 'A'},
42                                   {"Underwater basketweaving", 'C'},
43                                   {"Dog grooming", 'B'}  },
44        },
45        { "B. Smart", 2018, 4, { {"Flowerpot construction", 'B'},
46                                 {"Underwater basketweaving", 'C'},
47                                 {"Cat dentistry", 'C'},
48                                 {"Dog grooming", 'C'},  }
49        }
50    };
51    int num_students = sizeof(students) / sizeof(student_t);
52
53    for (int i = 0; i < num_students; i++) {
54        double gpa = compute_gpa(students[i].courses_completed, students[i].num_courses_completed);
55        print_student(students[i], gpa);
56    }
57    return EXIT_SUCCESS;
58}

Compiling and running this code gives the following output:

A. Student, Class of 2019, GPA: 3.00
B. Smart, Class of 2018, GPA: 2.25

7.3. Storage classes, the stack and the heap

There are two essential "storage classes" in C: automatic and static. Static storage is somewhat of an advanced concept, and you can refer to the sidebar for a brief discussion.

"Automatic" variables are what we have used exclusively thus far: they are variables that come into existence when a code block is entered, and which are discarded when a code block is exited. This type of allocation and deallocation is done on the call stack of the running program. Recall that all parameters and local variables are allocated on the stack in a last-in, first-out manner. This is exactly the idea behind the "automatic" storage class --- memory is automatically assigned on the stack [7].

It's worth repeating that all variables in examples we've considered to this point are "automatic" and allocated on the stack. That goes for strings, arrays of various sorts, structures, etc. Most often, we've declared some local variables in main (which are allocated on the stack of main), and passed parameters into functions (which results in the creation of copies of those parameters in the stack frame of the called function).

Exercises

  1. Write a function that takes two struct fractions, a and b and compares them for equality. Return -1 if a is less than b, 0 if they are equal, and 1 if a is greater than b.

  2. Refactor and modularize the code in exercise 1 in the struct chapter. At the very least, write functions to parse a single line into a struct, and to print out a struct.

  3. Write a text-based program to play a version of the game show "Wheel of fortune". Many of you have probably written a similar program in Python or another language. Test your mettle by writing it in C.

Footnotes