C is a simple programming language with few keywords and a relatively simple to
understand syntax. C is also useless. C itself has no input/output commands, doesn't have support for strings as a fundamental (atomic) data type. No useful math functions built in. Because C is useless by itself, it requires the use of libraries. This increases the complexity of C. The issue of standard libraries is resolved through the use of ANSI libraries and other methods. C Programming :: Hello, World Let's give a go at a very simple program that prints out "Hello World" to standard out (usually your monitor). We'll call our little program hello.c. #include <stdio.h>
main() { printf("Hello, world!\n"); return 0; }
What's all this junk just to print out Hello, World? Let's see what's happening: #include <stdio.h> - Tells the compiler to include this header file for compilation. o What is a header file? They contain prototypes and other compiler/pre- processor directives. Prototypes are basic abstract function definitions. More on these later... o Some common header files are stdio.h, stdlib.h, unistd.h, math.h. main() - This is a function, in particular the main block. { } - These curly braces are equivalent to stating "block begin" and "block end". These can be used at many places, such as if and switch. printf() - Ah... the actual print statement. Thankfully we have the header file stdio.h! But what does it do? How is it defined? return 0 - What's this? Who knows! Seems like trying to figure all this out is just way too confusing. Let's break things up one at at time: The return 0 statement. Seems like we are trying to give something back, and it is an integer. Maybe if we modified our main function definition: int main() Ok, now we are saying that our main function will be returning an integer! So remember, you should always explicitly declare the return type on the function! Something is still a little fishy... I remember that 0 implied false... so isn't it returning that an int signifying a bad result? Thankfully there is a simple solution to this. Let's add#include <stdlib.h> to our includes. Let's change our return statement to return EXIT_SUCCESS;. Now it makes sense! Let's take a look at printf. Hmm... I wonder what the prototype for printf is. Utilizing the man pages we see that printf is: int printf(const char *format, ...); printf returns an int. The man pages say that printf returns the number of characters printed. Now you wonder, who cares? Why should you care about this? It is good programming practice toALWAYS check for return values. It will not only make your program more readable, but in the end it will make your programs less error prone. But in this particular case, we don't really need it. So we cast the function's return to (void). fprintf, fflush, and exit are the only functions where you should do this. More on this later when we get to I/O. For now, let's just void the return value. What about documentation? We should probably doc some of our code so that other people can understand what we are doing. Comments in the C89 standard are noted by: /* */. The comment begins with /* and ends with */. Let's see our new improved code! #include <stdio.h> #include <stdlib.h>
/* Main Function * Purpose: Controls program, prints Hello, World! * Input: None * Output: Returns Exit Status */ int main(int argc, char **argv) { printf("Hello, world!\n"); return EXIT_SUCCESS; } Much better! The KEY POINT of this whole introduction is to show you the fundamental difference between correctness and understandability. Both sample codes produce the exact same output in "Hello, world!" However, only the latter example shows better readability in the code leading to code that is understandable. All codes will have bugs. If you sacrifice code readability with reduced (or no) comments and cryptic lines, the burden is shifted and magnified when your code needs to be maintained.
Document what you can. Complex data types, function calls that may not be obvious, etc. Good documentation goes a long way!
In the introduction, we discussed very simple C, now it is time for us to move ahead and explore the basics of C programming. If you do not understand the concepts explained in the Introduction, do not proceed. Make sure you understand completely the topics covered in the introduction before you dive into C. Operations :: Relational Operators You probably are familiar with < and the > relational operators from mathematics. The same principles apply in C when you are comparing two objects. There are six possibilities in C: <, <=, >, >=, !=, and ==. The first four a self- explanatory, the != stands for "not equals to" and == is "equivalent to". Here we can point out the difference between syntax and semantics. a = b is different from a == b. Most C compilers will allow both statements to be used in conditionals like if, but they have two completely different meanings. Make sure your assignment operators are where you want them to be and your relationals where you want relational comparisons! Operations :: Logical Operators Logical operators simulate boolean algebra in C. A sampling of Logical/Boolean Operators: &&, ||, &, |, and ^. For example, && is used to compare two objects with AND: x != 0 && y != 0 Expressions involving logical operators undergo Short-Circuit Evaluation. Take the above example into consideration. If x != 0 evaluates to false, the whole statement is false regardless of the outcome of y != 0. This can be a good thing or a bad thing depending on the context. (See Weiss pg. 51-52). Operations :: Arithmetic Operators There are other operators available. The two arithemetic operators that are used frequently are ++ and --. You can place these in front or on the back of variables. ++ specifies increment, the -- specifies decrement. If the operator is placed in front, it is prefix if it is placed behind, it is postfix. Prefix means, increment before any operations are performed, postfix is increment afterwards. These are important considerations when using these operators. C allows *= += /= -= operators. For example: int i = 5;
i *= 5; The int i would have the value 25 after the operation. For a full listing of operators, reference Weiss pg. 51. Basic C :: Conditionals if used with the above relational and logical operators allows for conditional statements. You can start blocks of code using { and }. if's can be coupled with else keyword to handle alternative outcomes. The ? : operator can be a shorthand method for signifying (if expression) ? (evaluate if true) : (else evaluate this). For example, you can use this in a return statement or a printf statement for conciseness. Beware! This reduces the readability of the program... see Introduction. This does not in any way speed up execution time. The switch statement allows for quick if-else checking. For example, if you wanted to determine what the char x was and have different outcomes for certain values of x, you could simply switch x and run cases. Some sample code: switch ( x ) { case 'a': /* Do stuff when x is 'a' */ break; case 'b': case 'c': case 'd': /* Fallthrough technique... cases b,c,d all use this code */ break; default: /* Handle cases when x is not a,b,c or d. ALWAYS have a default */ /* case!!! */ break; } Basic C :: Looping You can loop (jumping for those assembly junkies) through your code by using special loop keywords. These include while, for, and do while. The while loops until the expression specified is false. For example while (x < 4) will loop while x is less than 4. The syntax for for is different. Here's an example: for (i = 0; i < n; i++, z++). This code will loop until i is equal to n. The first argument specifies initializing conditions, the second argument is like the while expression: continue the for loop until this expression no longer evaulates to true. The third argument allows for adjustment of loop control variables or other variables. These statements can be null, e.g. for (; i < n; i++) does not specify initializing code. do while is like a "repeat-until" in Pascal. This is useful for loops that must be executed at least once. Some sample code would be: do { /* do stuff */ } while (statement); Basic C :: Types, Type Qualifiers, Storage Classes int, char, float, double are the fundamental data types in C. Type modifiers include: short, long, unsigned, signed. Not all combinations of types and modifiers are availble. Type qualifiers include the keywords: const and volatile. The const qualifier places the assigned variable in the constant data area of memory which makes the particular variable unmodifiable (technically it still is though). volatile is used less frequently and tells the compiler that this value can be modified outside the control of the program. Storage classes include: auto, extern, register, static. The auto keyword places the specified variable into the stack area of memory. This is usually implicit in most variable declarations, e.g. int i; The extern keyword makes the specified variable access the variable of the same name from some other file. This is very useful for sharing variables in modular programs. The register keyword suggests to the compiler to place the particular variable in the fast register memory located directly on the CPU. Most compilers these days (like gcc) are so smart that suggesting registers could actually make your program slower. The static keyword is useful for extending the lifetime of a particular variable. If you declare a static variable inside a function, the variable remains even after the function call is long gone (the variable is placed in the alterable area of memory). The static keyword is overloaded. It is also used to declare variables to be private to a certain file only when declared with global variables. static can also be used with functions, making those functions visible only to the file itself. A string is NOT a type directly supported by C. You, therefore, cannot "assign" stuff into strings. A string is defined by ANSI as an array (or collection) of characters. We will go more in-depth with strings later... Basic C, Operations, Types Review Relational and Logical operators are used to compare expressions. Conditonal statements allow for conditional execution of expressions using if, else, switch. Loops allow you to repeatedly do things until a stopping point is reached. Types, Storage Classes, and Type Qualifiers are used to modify the particular type's scope and lifetime. Now that we have a understanding of the very basics of C, it is time now to turn our focus over to making our programs not only run correctly but more efficiently and are moreunderstandable. Functions :: The Basics Why should we make functions in our programs when we can just do it all under main? Weiss (pg. 77) has a very good analogy that I'll borrow :) Think for a minute about high-end stereo systems. These stereo systems do not come in an all-in-one package, but rather come in separate components: pre-amplifier, amplifier, equalizer, receiver, cd player, tape deck, and speakers. The same concept applies to programming. Your programs become modularized and much more readable if they are broken down into components. This type of programming is known as top-down programming, because we first analyze what needs to be broken down into components. Functions allow us to create top-down modular programs. Each function consists of a name, a return type, and a possible parameter list. This abstract definition of a function is known as it's interface. Here are some sample function interfaces: char *strdup(char *s) int add_two_ints(int x, int y) void useless(void) The first function header takes in a pointer to a string and outputs a char pointer. The second header takes in two integers and returns an int. The last header doesn't return anything nor take in parameters. Some programmers like to separate returns from their function names to facilitate easier readability and searchability. This is just a matter of taste. For example: int add_two_ints(int x, int y) A function can return a single value to its caller in a statement using the keyword return. The return value must be the same type as the return type specified in the function's interface. Functions :: Prototypes In the introduction, we touched on function prototypes. To recap, what are function prototypes? Function prototypes are abstract function interfaces. These function declarations have no bodies; they just have their interfaces. Function prototypes are usually declared at the top of a C source file, or in a separate header file (see Appendix: Creating Libraries). For example, if you wanted to grab command line parameters for your program, you would most likely use the function getopt. But since this function is not part of ANSI C, you must declare the function prototype, or you will get implicit declaration warnings when compiling with our flags. So you can simply prototype getopt(3) from the man pages: /* This section of our program is for Function Prototypes */ int getopt(int argc, char * const argv[], const char *optstring); extern char *optarg; extern int optind, opterr, optopt; So if we declared this function prototype in our program, we would be telling the compiler explicitly what getopt returns and it's parameter list. What are those extern variables? Recall that extern creates a reference to variables across files, or in other words, it creates file global scope for those variables in that particular C source file. That way we can access these variables that getopt modifies directly. More on getopt on the next section about Input/Output. Functions :: Functions as Parameters This is a little more advanced section on functions, but is very useful. Take this for example: int applyeqn(int F(int), int max, int min) { int itmp;
itmp = F(int) + min; itmp = itmp - max; return itmp; } What does this function do if we call it with applyeqn(square(x), y, z);? What happens is that the int F(int) is a reference to the function that is passed in as a parameter. Thus inside applyeqn where there is a call to F, it actually is a call to square! This is very useful if we have one set function, but wish to vary the input according to a particular function. So if we had a different function called cube we could change how we call applyeqn by calling the function by applyeqn(cube(x), y, z);. Functions :: The Problem So now you must be thinking... Wow! Functions are great! I can do anything with functions! WRONG. There are four major ways that parameters are passed into functions. The two that we should be concerned with are Pass by Value and Pass by Reference. In C, all parameters are passed by value. So you're saying so what? It makes a big difference. In simplistic terms, functions in C create copies of the passed in variables. These variables remain on the stack for the lifetime of the function and then are discarded, so they do not affect the inputs! This is important. Let's repeat it again. Passed in arguments will remain unchanged. Let's use this swapping function as an example: void swap(int x, int y) { int tmp = 0;
tmp = x; x = y; y = tmp; } If you were to simply pass in parameters to this swapping function that swaps two integers, this would fail horribly. You'll just get the same values back. But thankfully, you can circumvent this pass by value limitation in C by simulating pass by reference. Pass by reference changes the values that are passed in when the function exits. This isn't how C works technically but can be thought of in the same fashion. So how do you avoid pass by value side effects? By using pointers and in some cases using macros. We will discuss pointers in detail later. The C Preprocessor :: Overview The C Preprocessor is not part of the compiler, but is a separate step in the compilation process. In simplistic terms, a C Preprocessor is just a text substitution tool. We'll refer to the C Preprocessor as the CPP. All preprocessor lines begin with #. This listing is from Weiss pg. 104. The unconditional directives are: o #include - Inserts a particular header from another file o #define - Defines a preprocessor macro o #undef - Undefines a preprocessor macro The conditional directives are: o #ifdef - If this macro is defined o #ifndef - If this macro is not defined o #if - Test if a compile time condition is true o #else - The alternative for #if o #elif - #else an #if in one statement o #endif - End preprocessor conditional Other directives include: o # - Stringization, replaces a macro parameter with a string constant o ## - Token merge, creates a single token from two adjacent ones Some examples of the above: #define MAX_ARRAY_LENGTH 20 Tells the CPP to replace instances of MAX_ARRAY_LENGTH with 20. Use #define for constants to increase readability. Notice the absence of the ;. #include <stdio.h> #include "mystring.h" Tells the CPP to get stdio.h from System Libraries and add the text to this file. The next line tells CPP to get mystring.h from the local directory and add the text to the file. This is a difference you must take note of. #undef MEANING_OF_LIFE #define MEANING_OF_LIFE 42 Tells the CPP to undefine MEANING_OF_LIFE and define it for 42. #ifndef IROCK #define IROCK "You wish!" #endif Tells the CPP to define IROCK only if IROCK isn't defined already. #ifdef DEBUG /* Your debugging statements here */ #endif Tells the CPP to do the following statements if DEBUG is defined. This is useful if you pass the -DDEBUG flag to gcc. This will define DEBUG, so you can turn debugging on and off on the fly! The C Preprocessor :: Parameterized Macros One of the powerful functions of the CPP is the ability to simulate functions using parameterized macros. For example, we might have some code to square a number: int square(int x) { return x * x; } We can instead rewrite this using a macro: #define square(x) ((x) * (x)) A few things you should notice. First square(x) The left parentheses must "cuddle" with the macro identifier. The next thing that should catch your eye are the parenthesis surrounding the x's. These are necessary... what if we used this macro as square(1 + 1)? Imagine if the macro didn't have those parentheses? It would become ( 1 + 1 * 1 + 1 ). Instead of our desired result of 4, we would get 3. Thus the added parentheses will make the expression ( (1 + 1) * (1 + 1) ). This is a fundamental difference between macros and functions. You don't have to worry about this with functions, but you must consider this when using macros. Remeber that pass by value vs. pass by reference issue earlier? I said that you could go around this by using a macro. Here is swap in action when using a macro: #define swap(x, y) { int tmp = x; x = y; y = tmp } Now we have swapping code that works. Why does this work? It's because the CPP just simply replaces text. Wherever swap is called, the CPP will replace the macro call with the defined text. We'll go into how we can do this with pointers later. Input/Output and File I/O With most of the basics of C under our belts, lets focus now on grabbing Input and directing Output. This is essential for many programs that might require command line parameters, or standard input. I/O :: printf(3) printf(3) is one of the most frequently used functions in C for output. The prototype for printf(3) is: int printf(const char *format, ...); printf takes in a formatting string and the actual variables to print. An example of printf is: int x = 5; char str[] = "abc"; char c = 'z'; float pi = 3.14;
printf("\t%d %s %f %s %c\n", x, str, pi, "WOW", c); The output of the above would be: 5 abc 3.140000 WOW z Let's see what's happening. The \t line signifies an escape sequence, specifically, a tab. Then the %d specifies a conversion specification as given by the variable x. The %smatches with the string and the %f matches with the float. The default precision for %f is 6 places after the decimal point. %f works for both floats and doubles. For long doubles, use %Lf. The %s matches with the "WOW", and the %c tells printf to output the char c. The \n signifies a newline. For a listing of escape sequences, see Weiss pg. 183. You can format the output through the formatting line.. By modifying the conversion specification, you can change how the particular variable is placed in output. For example: printf("%10.4d", x); Would print this: 0005 The . allows for precision. This can be applied to floats as well. The number 10 puts 0005 over 10 spaces so that the number 5 is on the tenth spacing. You can also add + and -right after % to make the number explicitly output as +0005. Note that this does not actually change the value of x. In other words, using %-10.4d will not output -0005. %e is useful for outputting floats using scientific notation. %le for doubles and %Le for long doubles. I/O :: scanf(3) scanf(3) is useful for grabbing things from input. Beware though, scanf isn't the greatest function that C has to offer. Some people brush off scanf as a broken function that shouldn't be used often. The prototype for scanf is: int scanf( const char *format, ...); Looks similar to printf, but doesn't completely behave like printf does. Take for example: scanf("%d", x); You'd expect scanf to read in an int into x. But scanf requires that you specify the address to where the int should be stored. Thus you specify the address-of operator (more on this when we get to pointers). Therefore, scanf("%d", &x); will put an int into x correctly. Simple enough, eh? Think again. scanf's major "flaw" is it's inability to digest incorrect input. If scanf is expecting an int and your standard in keeps giving it a string, scanf will keep trying at the same location. If you looped scanf, this would create an infinite loop. Take this example code: int x, args;
for ( ; ; ) { printf("Enter an integer bub: "); if (( args = scanf("%d", &x)) == 0) { printf("Error: not an integer\n"); continue; } else { if (args == 1) printf("Read in %d\n", x); else break; } } The code above will fail. Why? It's because scanf isn't discarding bad input. So instead of using just continue;, we have to add a line before it to digest input. We can use a function called digestline(). void digestline(void) { scanf("%*[^\n]"); /* Skip to the End of the Line */ scanf("%*1[\n]"); /* Skip One Newline */ } This function is taken from Weiss pg. 341. Using assignment suppression, we can use * to suppress anything contained in the set [^\n]. This skips all characters until the newline. The next scanf allows one newline character read. Thus we can digest bad input! The section on scanf by Weiss is excellent. Section 12.2, pgs. 336-342. File I/O :: fgets(3) One of the alternatives to scanf/fscanf is fgets. The prototype is: char *fgets(char *s, int size, FILE *stream); fgets reads in size - 1 characters from the stream and stores it into *s pointer. The string is automatically null-terminated. fgets stops reading in characters if it reaches an EOF or newline. Now that you've read characters of interest from a stream, what do you do with the string? Simple! Use sscanf, see below. File I/O :: sscanf(3) To scan a string for a format, the sscanf library call is handy. It's prototype: int sscanf(const char *str, const char *format, ...); sscanf works much like fscanf except it takes a character pointer instead of a file pointer. Using the combination of fgets/sscanf instead of scanf/fscanf you can avoid the "digestion" problem (or bug, depending on who you talk to :) File I/O :: fprintf(3) It is sometimes useful also to output to different streams. fprintf(3) allows us to do exactly that. The prototype for fprintf is: int fprintf(FILE *stream, const char *format, ...); fprintf takes in a special pointer called a file pointer, signified by FILE *. It then accepts a formatting string and arguments. The only difference between fprintf and printf is that fprintf can redirect output to a particular stream. These streams can be stdout, stderr, or a file pointer. More on file pointers when we get to fopen. An example: fprintf(stderr, "ERROR: Cannot malloc enough memory.\n"); This outputs the error message to standard error. File I/O :: fscanf(3) fscanf(3) is basically a streams version of fscanf. The prototype for fscanf is: int fscanf( FILE *stream, const char *format, ...); File I/O :: fflush(3) Sometimes it is necessary to forcefully flush a buffer to its stream. If a program crashes, sometimes the stream isn't written. You can do this by using the fflush(3) function. The prototype for fflush is: int fflush(FILE *stream); Not very difficult to use, specify the stream to fflush. File I/O :: fopen(3), fclose(3), and File Pointers fopen(3) is used to open streams. This is most often used with opening files for input. fopen's prototype is: FILE *fopen (const char *path, const char *mode); fopen returns a file pointer and takes in the path to the file as well as the mode to open the file with. Take for example: FILE *Fp;
Fp = fopen("/home/johndoe/input.dat", "r"); This will open the file in /home/johndoe/input.dat for reading. Now you can use fscanf commands with Fp. For example: fscanf(Fp, "%d", &x); This would read in an integer from the input.dat file. If we opened input.dat with the mode "w", then we could write to it using fprintf: fprintf(Fp, "%s\n", "File Streams are cool!"); To close the stream, you would use fclose(3). The prototype for fclose is: int fclose( FILE *stream ); You would just give fclose the stream to close the stream. Remember to do this for all file streams, especially when writing to files! I/O and File I/O :: Return Values We have been looking at I/O functions without any regard to their return values. This is bad. Very, very bad. So to make our lives easier and to make our programs behave well, let's write some macros! Let's write some wrapper macros for these functions. First let's create a meta- wrapper function for all of our printf type functions: #define ERR_MSG( fn ) { (void)fflush(stderr); \ (void)fprintf(stderr, __FILE__ ":%d:" #fn ": %s\n", \ __LINE__, strerror(errno)); } #define METAPRINTF( fn, args, exp ) if( fn args exp ) ERR_MSG( fn ) This will create an ERR_MSG macro to handle error messages. The METAPRINTF is the meta-wrapper for our printf type functions. So let's define our printf type macros: #define PRINTF(args) METAPRINTF(printf, args, < 0) #define FPRINTF(args) METAPRINTF(fprintf, args, < 0) #define SCANF(args) METAPRINTF(scanf, args, < 0) #define FSCANF(args) METAPRINTF(fscanf, args, < 0) #define FFLUSH(args) METAPRINTF(fflush, args, < 0) Now we have our wrapper functions. Because args is sent to METAPRINTF, we need two sets of parentheses when we use the PRINTF macro. Examples on using the wrapper function: PRINTF(("This is so cool!")); FPRINTF((stderr, "Error bub!")); Now you can this code into a common header file and be able to use these convenient macros and still be able to check for return values! (Make sure you have included the string.h library) Note: We did not write macros for fopen and fclose. You must manually check for return values on those functions. Other I/O Functions There are many other Input/Output functions, such as fputs, getchar, putchar, ungetc. Refer to the man pages on these functions or in the Weiss text. Command Line Arguments and Parameters :: getopt(3) I'm sure you've run the ls -l command before. ls -l *.c would display all c files with extended information. These parameters and arguments can be handled by your c program through getopt(3). We have already seen getopt, but now lets actually make some code that makes this function useful. Let's see the prototype again: int getopt(int argc, char * const argv[], const char *optstring); extern char *optarg; extern int optind, opterr, optopt; In order for us to utilize argc and argv, we must allow these as parameters on our main() function: int main(int argc, char **argv) Now that we have everything set up, lets get this show on the road. Here's an example of using getopt: int ich;
while ((ich = getopt (argc, argv, "ab:c")) != EOF) { switch (ich) { case 'a': /* Flags/Code when -a is specified */ break; case 'b': /* Flags/Code when -b is specified */ /* The argument passed in with b is specified */ /* by optarg */ break; case 'c': /* Flags/Code when -c is specified */ break; default: /* Code when there are no parameters */ break; } }
if (optind < argc) { printf ("non-option ARGV-elements: "); while (optind < argc) printf ("%s ", argv[optind++]); printf ("\n"); } This code might be a bit confusing if taken in all at once. o The first step is to get getopt to pass an int into ich. The options allowed are specified by the "ab:c". The colon following b allows b to have arguments, e.g. -b gradient. Thus, optarg will contain the string "gradient". o ich is then switched to check for the parameters. o The ending conditional if (optind < argc) { checks for aguments passed in without an accompanying parameter. optind is the current index in the list of arguments passed in in the argv-list. argc is the total number of arguments passed in. o So if we had a program called "junk" and we called it from the command prompt as ./junk -b gradient yeehaw the variables would look like: o Variable Contains o ------------------ ---------- o argc 4 o argv[0] "./junk" o argv[1] "-b" o argv[2] "gradient" o argv[3] "yeehaw" o optarg at case 'b' "gradient" o optind after while 3 o getopt loop Input/Output and File I/O Review printf and scanf can be used for Input and Output, while the "f versions" of these can be used to modify streams. Make sure you check the return values! You can grab command line arguments and parameters through getopt.
Functions and C Preprocessor Review Functions allow for modular programming. You must remember that all parameters passed into function in C are passed by value! The C Preprocessor allows for macro definitions and other pre-compilation directives. It is just a text substitution tool before the actual compilation be Pointers :: Definition Pointers provide an indirect method of accessing variables. The reason why some people have difficulty understanding the concept of a pointer is that they are usually introduced without some sort of analogy or easily understood example. For our simple to understand example, let's think about a typical textbook. It will usually have a table of contents, some chapters, and an index. Suppose we have a Chemistry textbook and would like to find more information on the noble gases. What one would typically do instead of flipping through the entire text, is to consult the index in the back. The index would direct us to the page(s) on which we can read more on noble gases. Conceptually, this is how pointers work! A pointer is simply a reference containing a memory address. In our example, the noble gas entry in the index would list page numbers for more information. This is analogous to a pointer reference containing the memory address of where the real data is actually stored! You may be wondering, what is the point of this (no pun intended)? Why don't I just make all variables without the use of pointers? It's because sometimes you can't. What if you needed an array of ints, but didn't know the size of the array before hand? What if you needed a string, but it grew dynamically as the program ran? What if you need variables that are persistent through function use without declaring them global (remember the swap function)? They are all solved through the use of pointers. Pointers are also essential in creating larger custom data structures, such as linked lists. So now that you understand how pointers work, let's define them a little better. o A pointer when declared is just a reference. DECLARING A POINTER DOES NOT CREATE ANY SPACE FOR THE POINTER TO POINT TO. We will tackle this dynamic memory allocation issue later. o As stated prior, a pointer is a reference to an area of memory. This is known as a memory address. A pointer may point to dynamically allocated memory or a variable declared within a block. o Since a pointer contains memory addresses, the size of a pointer typically corresponds to the word size of your computer. You can think of a "word" as how much data your computer can access at once. Typical machines today are 32- or 64-bit machines. 8-bits per byte equates to 4- or 8-byte pointer sizes. More on this later. Pointers :: Declaration and Syntax Pointers are declared by using the * in front of the variable identifier. For example: int *ip; float *fp = NULL; This delcares a pointer, ip, to an integer. Let's say we want ip to point to an integer. The second line delares a pointer to a float, but initializes the pointer to point to the NULLpointer. The NULL pointer points to a place in memory that cannot be accessed. NULL is useful when checking for error conditions and many functions return NULL if they fail. int x = 5; int *ip;
ip = &x; We first encountered the & operator first in the I/O section. The & operator is to specify the address-of x. Thus, the pointer, ip is pointing to x by assigning the address of x. This is important. You must understand this concept. This brings up the question, if pointers contain addresses, then how do I get the actual value of what the pointer is pointing to? This is solved through the * operator. The *dereferences the pointer to the value. So, printf("%d %d\n", x, *ip); would print 5 5 to the screen. There is a critical difference between a dereference and a pointer declaration: int x = 0, y = 5, *ip = &y;
x = *ip; The statement int *ip = &y; is different than x = *ip;. The first statement does not dereference, the * signifies to create a pointer to an int. The second statement uses a dereference. Remember the swap function? We can now simulate call by reference using pointers. Here is a modified version of the swap function using pointers: void swap(int *x, int *y) { int tmp;
tmp = *x; *x = *y; *y = tmp; }
int main() { int a = 2, b = 3;
swap(&a, &b); return EXIT_SUCCESS; } This snip of swapping code works. When you call swap, you must give the address-of a and b, because swap is expecting a pointer. Why does this work? It's because you are giving the address-of the variables. This memory does not "go away" or get "popped off" after the function swap ends. The changes within swap change the values located in those memory addresses. Pointers :: Pointers and const Type Qualifier The const type qualifier can make things a little confusing when it is used with pointer declarations. The below example is from Weiss pg. 132: const int * const ip; /* The pointer *ip is const and what it points at is const */ int * const ip; /* The pointer *ip is const */ const int * ip; /* What *ip is pointing at is const */ int * ip; /* Nothing is const */ As you can see, you must be careful when specifying the const qualifier when using pointers. Pointers :: void Pointers void pointers can be assigned to any pointer value. It sometimes necessary to store/copy/move pointers without regard to the type it references. You cannot dereference a void pointer. Functions such as malloc, free, and scanf utilize void pointers. Pointers :: Pointers to Functions Earlier, we said that you can pass functions as parameters into functions. This was essentially a reference, or pointer, passed into the function. There is an alternative way to declare and pass in functions as parameters into functions. It is discussed in detail in Weiss, pgs. 135-136. Pointers :: Pointer Arithmetic C is one of the few languages that allows pointer arithmetic. In other words, you actually move the pointer reference by an arithmetic operation. For example: int x = 5, *ip = &x;
ip++; On a typical 32-bit machine, *ip would be pointing to 5 after initialization. But ip++; increments the pointer 32-bits or 4-bytes. So whatever was in the next 4- bytes, *ip would be pointing at it. Pointer arithmetic is very useful when dealing with arrays, because arrays and pointers share a special relationship in C. More on this when we get to arrays! Pointers Review Pointers are an indirect reference to something else. They are primarily used to reference items that might dynamically change size at run time. Pointers have special operators, & and *. The & operator gives the address-of a pointer. The * dereferences the pointer (when not used in a pointer declaration statement). You must be careful when using const type qualifier. You have to also be cautious about the void pointer. C allows pointer arithmetic, which gives the programmer the freedom to move the pointer using simple arithmetic. This is very powerful, yet can lead to disaster if not used properly.
Arrays You must understand the concepts discussed in the previous pointers section before proceeding. Arrays are a collection of items (i.e. ints, floats, chars) whose memory is allocated in a contiguous block of memory. Arrays and pointers have a special relationship. This is because arrays use pointers to reference memory locations. Therefore, most of the times, pointer and array references can be used interchangeably. Arrays :: Declaration and Syntax A simple array of 5 ints would look like: int ia[5]; This would effectively make an area in memory (if availble) for ia, which is 5 * sizeof(int). We will discuss sizeof() in detail in Dynamic Memory Allocation. Basically sizeof()returns the size of what is being passed. On a typical 32-bit machine, sizeof(int) returns 4 bytes, so we would get a total of 20 bytes of memory for our array. How do we reference areas of memory within the array? By using the [ ] we can effectively "dereference" those areas of the array to return values. printf("%d ", ia[3]); This would print the fourth element in the array to the screen. Why the fourth? This is because array elements are numbered from 0. Note: You cannot initialize an array using a variable. ANSI C does not allow this. For example: int x = 5; int ia[x]; This above example is illegal. ANSI C restricts the array intialization size to be constant. So is this legal? int ia[]; No. The array size is not known at compile time. How can we get around this? By using macros we can also make our program more readable! #define MAX_ARRAY_SIZE 5 /* .... code .... */
int ia[MAX_ARRAY_SIZE]; Now if we wanted to change the array size, all we'd have to do is change the define statement! But what if we don't know the size of our array at compile time? That's why we have Dynamic Memory Allocation. More on this later... Can we initialize the contents of the array? Yes! int ia[5] = {0, 1, 3, 4}; int ia[ ] = {0, 2, 1}; Both of these work. The first one, ia is 20 bytes long with 16 bytes initialized to 0, 1, 3, 4. The second one is also valid, 12 bytes initialized to 0, 2, 1. (Examples on a typical 32-bit machine). Arrays :: Relationship with Pointers So what's up with all this pointers are related to arrays junk? This is because an array name is just a pointer to the beginning of the allocated memory space. This causes "problems" in C, as the Limitations sub-section will show. Let's take this example and analyze it: int ia[6] = {0, 1, 2, 3, 4, 5}; /* 1 */ int *ip; /* 2 */
ip = ia; /* equivalent to ip = &ia[0]; */ /* 3 */ ip[3] = 32; /* equivalent to ia[3] = 32; */ /* 4 */ ip++; /* ip now points to ia[1] */ /* 5 */ printf("%d ", *ip); /* prints 1 to the screen */ /* 6 */ ip[3] = 52; /* equivalent to ia[4] = 52 */ /* 7 */ Ok, so what's happening here? Let's break this down one line at a time. Refer to the line numbers on the side: 1. Initialize ia 2. Create ip: a pointer to an int 3. Assign ip pointer to ia. This is effectively assigning the pointer to point to the first position of the array. 4. Assign the fourth position in the array to 32. But how? ip is just a pointer?!?! But what is ia? Just a pointer! (heh) 5. Use pointer arithmetic to move the pointer over in memory to the next block. Using pointer arithmetic automatically calls sizeof(). 6. Prints ia[1] to the screen, which is 1 7. Sets ia[4] to 52. Why the fifth position? Because ip points to ia[1] from the ip++ line. Now it should be clear. Pointers and arrays have a special relationship because arrays are actually just a pointer to a block of memory! Arrays :: Multidimensional Sometimes its necessary to declare multidimensional arrays. In C, multidimensional arrays are row major. In other words, the first bracket specifies number of rows. Some examples of multidimensional array declarations: int igrid[2][3] = { {0, 1, 2}, {3, 4, 5} }; int igrid[2][3] = { 0, 1, 2, 3, 4, 5 }; int igrid[ ][4] = { {0, 1, 2, 3}, {4, 5, 6, 7}, {8, 9} }; int igrid[ ][2]; The first three examples are valid, the last one is not. As you can see from the first two examples, the braces are optional. The third example shows that the number of rows does not have to be specified in an array initialization. This seems simple enough. But what if we stored pointers in our arrays? This would effectively create a multidimensional array! Since reinforcement of material is key to learning it, let's go back to getopt. Remember the variable argv? It can be declared in the main function as either **argv or *argv[]. What does **argv mean? It looks like we have two pointers or something. This is actually a pointer to a pointer. The *argv[] means the same thing, right? Imagine (pardon the crappy graphics skills): argv +---+ | 0 | ---> "./junk" +---+ | 1 | ---> "-b" +---+ | 2 | ---> "gradient" +---+ | 3 | ---> "yeehaw" +---+ So what would argv[0][1] be? It would be the character '/'. Why is this? It's because strings are just an array of characters. So in effect, we have a pointer to the actual argv array and a pointer at each argv location to each string. A pointer to a pointer. We will go more in depth into strings later. Arrays :: Limitations Because names of arrays represents just a pointer to the beginning of the array, we have some limitations or "problems." 1. No Array Out of Bounds Checking. For example: 2. int ia[2] = {0, 1}; 3. 4. printf("%d ", ia[2]); The above code would segfault, because you are trying to look at an area of memory not inside the array memory allocation. 5. Array Size Must be Constant or Known at Compile-time. See Arrays :: Declaration and Syntax. 6. Arrays Cannot be Copied or Compared. Why? Because they are pointers. See Weiss pg. 149 for a more in-depth explanation. 7. Array Index Type must be Integral. Another limitation comes with arrays being passed into functions. Take for example: void func(int ia[]) void func(int *ia) Both are the same declaration (you should know why by now). But why would this cause problems? Because only the pointer to the array is passed in, not the whole array. So what if you mistakenly did a sizeof(ia) inside func? Instead of returning the sizeof the whole array, it would return the size of a pointer which corresponds to the word size of the computer. Arrays Review Arrays are, in simple terms, just a pointer! Remember that! There isn't much to review. Remember arrays have limitations because they are inherently just a pointer. Wait we mentioned that already. :) Dynamic Memory Allocation Now that we have firm grasp on pointers, how can we allocate memory at run-time instead of compile time? ANSI C provides five standard functions that helps you allocate memory on the heap. Dynamic Memory Allocation :: sizeof() We have already seen this function in the array section. To recap, sizeof() returns a size_t of the item passed in. So on a typical 32-bit machine, sizeof(int) returns 4 bytes.size_t is just an unsigned integer constant. sizeof() is helpful when using malloc or calloc calls. Note that sizeof() does not always return what you may expect (see below). Dynamic Memory Allocation :: malloc(3), calloc(3), bzero(3), memset(3) The prototype for malloc(3) is: void *malloc(size_t size); malloc takes in a size_t and returns a void pointer. Why does it return a void pointer? Because it doesn't matter to malloc to what type this memory will be used for. Let's see an example of how malloc is used: int *ip;
ip = malloc(5 * sizeof(int)); Pretty simple. sizeof(int) returns the sizeof an integer on the machine, multiply by 5 and malloc that many bytes. Wait... we're forgetting something. AH! We didn't check for return values. Here's some modified code: #define INITIAL_ARRAY_SIZE 5 /* ... code ... */ int *ip;
if ((ip = malloc(INITIAL_ARRAY_SIZE * sizeof(int))) == NULL) { (void)fprintf(stderr, "ERROR: Malloc failed"); (void)exit(EXIT_FAILURE); /* or return EXIT_FAILURE; */ } Now our program properly prints an error message and exits gracefully if malloc fails. calloc(3) works like malloc, but initializes the memory to zero if possible. The prototype is: void *calloc(size_t nmemb, size_t size); Refer to Weiss pg. 164 for more information on calloc. bzero(3) fills the first n bytes of the pointer to zero. Prototype: void bzero(void *s, size_t n); If you need to set the value to some other value (or just as a general alternative to bzero), you can use memset: void *memset(void *s, int c, size_t n); where you can specify c as the value to fill for n bytes of pointer s. Dynamic Memory Allocation :: realloc(3) What if we run out of allocated memory during the run-time of our program and need to give our collection of items more memory? Enter realloc(3), it's prototype: void *realloc(void *ptr, size_t size); realloc takes in the pointer to the original area of memory to enlarge and how much the total size should be. So let's give it a try: ip = realloc(ip, sizeof(ip) + sizeof(int)*5); Now we have some more space through adding the sizeof the complete array and an additional 5 spaces for ints... STOP! This is NOT how you use realloc. Again. The above example is wrong. Why? First, sizeof(ip) does not give the size of the allocated space originally allocated by malloc (or a previous realloc). Using sizeof() on a pointer only returns the sizeof the pointer, which is probably not what you intended. Also, what happens if the realloc on ip fails? ip gets set to NULL, and the previously allocated memory to ip now has no pointer to it. Now we have allocated memory just floating in the heap without a pointer. This is called a memory leak. This can happen from sloppy realloc's and not using free on malloc'd space. So what is the correct way? Take this code for example: int *tmp; if ((tmp = realloc(ip, sizeof(int) * (INITIAL_ARRAY_SIZE + 5))) == NULL) { /* Possible free on ip? Depends on what you want */ fprintf(stderr, "ERROR: realloc failed"); } ip = tmp; Now we are creating a temporary pointer to try a realloc. If it fails, then it isn't a big problem as we keep our ip pointer on the original memory space. Also, note that we specified the real size of our original array and now are adding 5 more ints (so 4bytes*(5+5) = 40bytes, on a typical 32-bit machine). Dynamic Memory Allocation :: free(3) Now that we can malloc, calloc, and realloc we need to be able to free the memory space if we have no use for it anymore. Like we mentioned above, any memory space that loses its pointer or isn't free'd is a memory leak. So what's the prototype for free(3)? Here it is: void free(void *ptr); free simply takes in a pointer to free. Not challenging at all. Note that free can take in NULL, as specified by ANSI. Dynamic Memory Allocation :: Multi-dimensional Structures It's nice that we can create a ``flat" structure, like an array of 100 doubles. But what if we want to create a 2D array of doubles at runtime? This sounds like a difficult task, but it's actually simple! As an example, lets say we are reading in a file of x, y, z coordinates from a file of unknown length. The incorrect method to approach this task is to create an arbitrarily large 2D array with hopefully enough rows or entries. Instead of leaving our data structure to chance, let's just dynamically allocate, and re-allocate on the fly. First, let's define a few macros to keep our code looking clean: #define oops(s) { perror((s)); exit(EXIT_FAILURE); } #define MALLOC(s,t) if(((s) = malloc(t)) == NULL) { oops("error: malloc() "); } #define INCREMENT 10 MALLOC macro simply takes in the pointer (s) to the memory space to be allocated (t). oops is called when malloc fails, returning the error code from malloc and exits the program.INCREMENT is the default amount of memory to allocate when we run out of allocated space. On to the dynamic memory allocation! double **xyz; int i;
MALLOC(xyz, sizeof(double *) * INCREMENT); for (i = 0; i < INCREMENT; i++) { MALLOC(xyz[i], sizeof(double) * 3); } What's going on here? Our double pointer, xyz is our actual storage 2D array. We must use a double pointer, because we are pointing to multiple pointers of doubles! If this sounds confusing, think of it this way. Instead of each array entry having a real double entry, each array position contains a pointer to another array of doubles! Therefore, we have our desired 2D array structure. The first MALLOC call instructs malloc to create 10 double pointers in the xyz array. So each of these 10 array positions now has an unitializied pointer to data of type pointer to a double. The for loop goes through each array position and creates a new array at each position to three doubles, because we want to read in x, y, z coordinates for each entry. The total space we just allocated is 10 spaces of 3 doubles each. So we've just allocated 30 double spaces. What if we run out of space? How do we reallocate? double **tmp; int current_size, n;
/* clip ... other code */
if (current_size >= n) { if ((tmp = realloc(xyz, sizeof(double *) * (n + INCREMENT)) == NULL) { oops("realloc() error! "); } for (i = n; i < n + INCREMENT; i++) { MALLOC(tmp[i], sizeof(double) * 3); } n += INCREMENT; xyz = tmp; } What's going on here? Suppose our file of x, y, z coordinates is longer than 10 lines. On the 11th line, we'll invoke the realloc(). n is the current number of rows allocated.current_size indicates the number of rows we are working on (in our case, the expression would be 10 >= 10). We instruct realloc to reallocate space for xyz of (double *) type, or double pointers of the current size (n) plus the INCREMENT. This will give us 10 additional entries. Remember NEVER reallocate to the same pointer!! If realloc() succeeds, then we need to allocate space for the double array of size 3 to hold the x, y, z coordinates in the new xyz realloc'd array. Note the for loop, where we start and end. Then we cleanup by providing our new max array size allocated (n) and setting the xyz double pointer to the newly realloc'd and malloc'd space, tmp. Not as difficult as you might have imagined it to be, right? What if we're done with our array? We should free it! for (i = 0; i < n; i++) { free(xyz[i]); } free(xyz); The above code free's each entry in the xyz array (the actual double pointers to real data) and then we free the pointer to a pointer reference. The statements cannot be reversed, because you'll lose the pointer reference to each 3-entry double array! Dynamic Memory Allocation Review You have powerful tools you can use when allocating memory dynamically: sizeof, malloc, calloc, realloc, and free. Take precautions when using the actual memory allocation functions for memory leaks, especially with realloc. Remember, always check for NULL with malloc! Your programs will thank you for it. Strings We have discussed arrays previously, but we have not discussed them in depth in the context of character arrays. These character arrays are referred to as strings. Again, strings are not directly supported in C. Let's try that again, there is no direct string support in C. So how do we emulate strings in C? By correctly creating string constants or properly allocating space for a character array we can get some string action in C. Strings :: Declaration and Syntax Let's see some examples of string declarations: char str[5] = {'l', 'i', 'n', 'u', 'x'}; char str[6] = {'l', 'i', 'n', 'u', 'x', '\0'}; char str[3]; char str[ ] = "linux"; char str[5] = "linux"; char str[9] = "linux"; All of the above declarations are legal. But which ones don't work? The first one is a valid declaration, but will cause major problems because it is not null-terminated. The second example shows a correct null-terminated string. The special escape character \0 denotes string termination. The fifth example also suffers the same problem. The fourth example, however does not. This is because the compiler will determine the length of the string and automatically initialize the last character to a null-terminator. Strings :: Dynamic Memory Allocation This stuff is much the same as the previous section. You must be careful to allocate one additonal space to contain the null-terminator. For example: char *s;
if ((s = malloc(sizeof(char) * 5)) == NULL) { /* ERROR Handling code */ } strcpy(s, "linux"); printf("%s\n", s); This would result in a bunch of junk being printed to the screen. printf will try to print the string, but will continue to print past the allocated memory for s, because there is no null-terminator. The simple solution would be to add 1 to the malloc call. You must be particularly careful when using malloc or realloc in combination with strlen. strlen returns the size of a string minus the null-terminator. More on strlen on the next sub-section. A final note: What's wrong with the following code: char s1[ ] = "linux"; char *s2;
strcpy(s2, s1); Remember that simply declaring a pointer does not create any space for the pointer to point to (remember that?). Strings :: string.h Library You can add support for string operations via the string.h library. (Note: If you understand everything that has gone on by now, you should be able to code most of the functions in string.h!) Below is a listing of prototypes for commonly used functions in string.h: size_t strlen(const char *s); char *strdup(const char *s); char *strcpy(char *dest, const char *src); char *strncpy(char *dest, const char *src, size_t n); char *strcat(char *dest, const char *src); char *strncat(char *dest, const char *src, size_t n); int strcmp(const char *s1, const char *s2); int strncmp(const char *s1, const char *s2, size_t n); int atoi(const char *nptr); double atof(const char *nptr); See Weiss pg. 486 (Appendix D.14) for a full string.h listing. Weiss Appendix D is your friend. Use it! Strings Review Strings are just character arrays. Nothing more, nothing less. Strings must be null-terminated if you want to properly use them. Remember to take into account null-terminators when using dynamic memory allocation. The string.h library has many useful functions. Most of the I/O involved with strings was covered in the Input/Output and File I/O section. If you still don't grasp strings, read Weiss Chapter 8. Structures A structure in C is a collection of items of different types. You can think of a structure as a "record" is in Pascal or a class in Java without methods. Structures, or structs, are very useful in creating data structures larger and more complex than the ones we have discussed so far. We will take a cursory look at some more complex ones in the next section. Structures :: Declaration and Syntax So how is a structure declared and initialized? Let's look at an example: struct student { char *first; char *last; char SSN[9]; float gpa; char **classes; };
struct student student_a, student_b; Another way to declare the same thing is: struct { char *first; char *last; char SSN[10]; float gpa; char **classes; } student_a, student_b; As you can see, the tag immediately after struct is optional. But in the second case, if you wanted to declare another struct later, you couldn't. The "better" method of initializing structs is: struct student_t { char *first; char *last; char SSN[10]; float gpa; char **classes; } student, *pstudent; Now we have created a student_t student and a student_t pointer. The pointer allows us greater flexibility (e.g. Create lists of students). How do you go about initializing a struct? You could do it just like an array initialization. But be careful, you can't initialize this struct at declaration time because of the pointers. But how do we access fields inside of the structure? C has a special operator for this called "member of" operator denoted by . (period). For example, to assign the SSN ofstudent_a: strcpy(student_a.SSN, "111223333\0"); Structures :: Pointers to Structs Sometimes it is useful to assign pointers to structures (this will be evident in the next section with self-referential structures). Declaring pointers to structures is basically the same as declaring a normal pointer: struct student *student_a; But how do we dereference the pointer to the struct and its fields? You can do it in one of two ways, the first way is: printf("%s\n", (*student_a).SSN); This would get the SSN in student_a. Messy and the readability is horrible! Is there a better way? Of course, programmers are lazy! :) To dereference, you can use the infix operator: ->. The above example using the new operator: printf("%s\n", student_a->SSN); If we malloc'd space for the structure for *student_a could we start assigning things to pointer fields inside the structure? No. You must malloc space for each individual pointer within the structure that is being pointed to. Structures :: typedef There is an easier way to define structs or you could "alias" types you create. For example: typedef struct { char *first; char *last; char SSN[9]; float gpa; char **classes; } student;
student student_a; Now we get rid of those silly struct tags. You can use typedef for non-structs: typedef long int *pint32;
pint32 x, y, z; x, y and z are all pointers to long ints. typedef is your friend. Use it. Structures :: Unions Unions are declared in the same fashion as structs, but have a fundamental difference. Only one item within the union can be used at any time, because the memory allocated for each item inside the union is in a shared memory location. Why you ask? An example first: struct conditions { float temp; union feels_like { float wind_chill; float heat_index; } } today; As you know, wind_chill is only calculated when it is "cold" and heat_index when it is "hot". There is no need for both. So when you specify the temp in today, feels_like only has one value, either a float for wind_chill or a float for heat_index. Types inside of unions are unrestricted, you can even use structs within unions. Structures :: Enumerated Types What if you wanted a series of constants without creating a new type? Enter enumerated types. Say you wanted an "array" of months in a year: enum e_months {JAN=1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC}; typedef enum e_months month;
month currentmonth; currentmonth = JUN; /* same as currentmonth = 6; */ printf("%d\n", currentmonth); We are enumerating the months in a year into a type called month. You aren't creating a type, because enumerated types are simply integers. Thus the printf statement uses %d, not %s. If you notice the first month, JAN=1 tells C to make the enumeration start at 1 instead of 0. Note: This would be almost the same as using: #define JAN 1 #define FEB 2 #define MAR 3 /* ... etc ... */ Structures :: Abilities and Limitations You can create arrays of structs. Structs can be copied or assigned. The & operator may be used with structs to show addresses. Structs can be passed into functions. Structs can also be returned from functions. Structs cannot be compared! Structures Review Structures can store non-homogenous data types into a single collection, much like an array does for common data (except it isn't accessed in the same manner). Pointers to structs have a special infix operator: -> for dereferencing the pointer. typedef can help you clear your code up and can help save some keystrokes. Enumerated types allow you to have a series of constants much like a series of #define statements. Advanced Data Structures In the previous section, we mentioned that you can create pointers to structures. The Data Structures presented here all require pointers to structs, or more specifically they are self-referential structures. These self-referential structures contain pointers within the structs that refer to another identical structure. Advanced Data Structures :: Linked Lists Linked lists are the most basic self-referential structures. Linked lists allow you to have a chain of structs with related data. So how would you go about declaring a linked list? It would involve a struct and a pointer: struct llnode { <type> data; struct llnode *next; }; The <type> signifies data of any type. This is typically a pointer to something, usually another struct. The next line is the next pointer to another llnode struct. Another more convenient way using typedef: typedef struct list_node { <type> data; struct list_node *next; } llnode;
llnode *head = NULL; Note that even the typedef is specified, the next pointer within the struct must still have the struct tag! There are two ways to create the root node of the linked list. One method is to create a head pointer and the other way is to create a dummy node. It's usually easier to create a head pointer. Now that we have a node declaration down, how do we add or remove from our linked list? Simple! Create functions to do additions, removals, and traversals. o Additions: A sample Linked list addition function: o void add(llnode **head, <type> data_in) { o llnode *tmp; o o if ((tmp = malloc(sizeof(*tmp))) == NULL) { o ERR_MSG(malloc); o (void)exit(EXIT_FAILURE); o } o tmp->data = data_in; o tmp->next = *head; o *head = tmp; o } o o /* ... inside some function ... */ o llnode *head = NULL; o <type> *some_data; o /* ... initialize some_data ... */ o o add(&head, some_data); What's happening here? We created a head pointer, and then sent the address-of the head pointer into the add function which is expecting a pointer to a pointer. We send in the address-of head. Inside add, a tmp pointer is allocated on the heap. The data pointer on tmp is moved to point to the data_in. The next pointer is moved to point to the head pointer (*head). Then the head pointer is moved to point to tmp. Thus we have added to the beginning of the list. o Removals: You traverse the list, querying the next struct in the list for the target. If you get a match, set the current target next's pointer to the pointer of the next pointer of the target. Don't forget to free the node you are removing (or you'll get a memory leak)! You need to take into consideration if the target is the first node in the list. There are many ways to do this (i.e. recursively). Think about it! o Traversals: Traversing list is simple, just query the data part of the node for pertinent information as you move from next to next. There are different methods for traversing trees (see Trees). What about freeing the whole list? You can't just free the head pointer! You have to free the list. A sample function to free a complete list: void freelist(llnode *head) { llnode *tmp;
while (head != NULL) { free(head->data); /* Don't forget to free memory within the list! */ tmp = head->next; free(head); head = tmp; } } Now we can rest easy at night because we won't have memory leaks in our lists! Advanced Data Structures :: Stacks Stacks are a specific kind of linked list. They are referred to as LIFO or Last In First Out. Stacks have specific adds and removes called push and pop. Pushing nodes onto stacks is easily done by adding to the front of the list. Popping is simply removing from the front of the list. It would be wise to give return values when pushing and popping from stacks. For example, pop can return the struct that was popped. Advanced Data Structures :: Queues Queues are FIFO or First In First Out. Think of a typical (non-priority) printer queue: The first jobs submitted are printed before jobs that are submitted after them. Queues aren't more difficult to implement than stacks. By creating a tail pointer you can keep track of both the front and the tail ends of the list. This allows you to enqueue onto the tail of the list, and dequeue from the front of the list. Advanced Data Structures :: Hash Tables So what's the problem with linked lists? Their efficiency isn't that great. (In Big-O notation, a linked list performs O(n)). Is there a way to speed up data structures? Enter hash tables. Hash tables provide O(1) performance while having the ability to grow dynamically. The key to a well-performing hash table is understanding the data that will be inserted into it. By custom tailoring an array of pointers, you can have O(1) access. But you are asking, how do you know where a certain data piece is in within the array? This is accomplished through a key. A key is based off the data, the most simple one's involve applying a modulus to a certain piece of information within the data. The general rule, is that if a key sucks, the hash table sucks. What about collisions (e.g. same key for two different pieces of information)? There are many ways to resolve this, but the most popular way is through coalesced chaining. You can create a linked list from the array position to hold multiple data pieces, if necessary. Weiss provides a more in-depth study on hash tables, section 10.3, pg. 271-279. Advanced Data Structures :: Trees Another variation of a linked list is a tree. A simple binary tree involves having two types of "next" pointers, a left and a right pointer. You can halve your access times by splitting your data into two different paths, while keeping a uniform data structure. But trees can degrade into linked list efficiency. There are different types of trees, some popular ones are self-balancing. AVL trees are a typical type of tree that can move nodes around so that the tree is balanced without a >1 height difference between levels. If you want more information on trees or self-balancing trees, you can query google about this. Advanced Data Structures Review Linked lists, stacks, queues, hash tables, trees are all different types of data structures that can help accomodate almost any type of data. Other data structures exist such as graphs. That is beyond the scope of this tutorial. If you want a more in-depth look at the data structures discussed here, refer to Weiss chapter 10, pg. 257-291 and chapter 11 pg. 311-318 for information on binary search trees. For more information on recursive functions, see Weiss chapter 11, pg. 294-311. Make and Makefiles Overview Make allows a programmer to easily keep track of a project by maintaining current versions of their programs from separate sources. Make can automate various tasks for you, not only compiling proper branch of source code from the project tree, but helping you automate other tasks, such as cleaning directories, organizing output, and even debugging. Make and Makefiles :: An Introduction If you had a program called hello.c, you could simply call make in the directory, and it would call cc (gcc) with -o hello option. But this isn't why make is such a nice tool for program building and management. The power and ease of use of make is facilitated through the use of a Makefile. Make parses the Makefile for directives and according to what parameters you give make, it will execute those rules. Rules take the following form: target target_name : prerequisites ... command ... The target is the parameter you give make. For example make clean would cause make to carry out the target_name called clean. If there are any prerequisites to process, they make will do those before proceeding. The commands would then be executed under the target. NOTE: The commands listed must be TABBED over! Examples? The following Makefile below is a very simple one taken from the GNU make manual: edit : main.o kbd.o command.o display.o insert.o search.o files.o utils.o cc -o edit main.o kbd.o command.o display.o insert.o search.o \ files.o utils.o
main.o : main.c defs.h cc -c main.c kbd.o : kbd.c defs.h command.h cc -c kbd.c command.o : command.c defs.h command.h cc -c command.c display.o : display.c defs.h buffer.h cc -c display.c insert.o : insert.c defs.h buffer.h cc -c insert.c search.o : search.c defs.h buffer.h cc -c search.c files.o : files.c defs.h buffer.h command.h cc -c files.c utils.o : utils.c defs.h cc -c utils.c clean : rm edit main.o kbd.o command.o display.o insert.o search.o \ files.o utils.o Now if you change just kbd.c, it will only recompile kbd.c into it's object file and then relink all of the object files to create edit. Much easier than recompiling the whole project! But that's still too much stuff to write! Use make's smarts to deduce commands. The above example re-written (taken from GNU make manual): objects = main.o kbd.o command.o display.o insert.o search.o files.o utils.o
.PHONY: clean clean : rm edit $(objects) So what changed? Now we have a grouping of objects containing all of our object files so that the edit target only requires this variable. You may also notice that all of the .c files are missing from the prerequisite line. This is because make is deducing that the c source is a required part of the target and will automatically use the c source file associated with the object file to compile. What about the .PHONY target? Let's say you actually have a file called "clean". If you had just a clean target without the .PHONY, it would never clean. To avoid this, you can use the .PHONY target. This isn't used that often because it is rare to have a file called "clean" in the target directory... but who knows if you might have one? Make and Makefiles :: Beyond Simple You can include other Makefiles by using the include directive. You can create conditional syntax in Makefiles, using ifdef, ifeq, ifndef, ifneq. You can create variables inside of Makefiles, like the $(objects) above. Let's use a different example. The hypothetical source tree: moo.c / \ --- --- / \ foo.c bar.c / \ ------- ------- / \ / \ baz.c loop.h dood.c shazbot.c / \ ------- ------- / \ / \ mop.c <libgen.h> woot.c defs.h Let's create a more complex, yet easier to maintain Makefile for this project: # Source, Executable, Includes, Library Defines INCL = loop.h defs.h SRC = moo.c foo.c bar.c baz.c dood.c shazbot.c mop.c woot.c OBJ = $(SRC:.c=.o) LIBS = -lgen EXE = moolicious
# Compile and Assemble C Source Files into Object Files %.o: %.c $(CC) -c $(CFLAGS) $*.c
# Link all Object Files with external Libraries into Binaries $(EXE): $(OBJ) $(CC) $(LDFLAGS) $(OBJ)
# Objects depend on these Libraries $(OBJ): $(INCL)
# Create a gdb/dbx Capable Executable with DEBUG flags turned on debug: $(CC) $(CFDEBUG) $(SRC)
# Clean Up Objects, Exectuables, Dumps out of source directory clean: $(RM) $(OBJ) $(EXE) core a.out Now we have a clean and readable Makefile that can manage the complete source tree. (Remember to use tabs on the command lines!) You can manipulate lots and lots of things with Makefiles. I cannot possibly cover everything in-depth in this short tutorial. You can navigate between directories (which can have separte Makefiles/rules), run shell commands, and various other tasks with make. Make and Makefiles :: Where to go from here? A lot of this tutorial references the GNU make manual. If you have any questions about Make, the GNU manual should cover it. GNU autoconf is a tool for automatically generating configure files from a configure.in file. These configure files can automatically setup Makefiles in conjunction with GNU automake and GNU m4. These tools are way beyond the scope of this document. Look in GNU's manual repository for more information on these tools. CVS is another tool that may be useful for very large projects. CVS stands for Concurrent Versions System and it allows you to record the history of your source files. CVS stores the base source and then stores the differences for each version. CVS also allows for protecting code pieces of a multi-developer effort from accidental overwriting... in other words, code-insulation. More information on CVS can be found here. Debugging Techniques Now that we have learned the basics of Makefiles, we can now look into debugging our code in conjunction with Makefiles. As an introduction we will be using three debugging techniques: 1. Non-interactive 2. GNU gdb 3. dbx Debugging Techniques :: Non-interactive You can debug your code by placing #ifdef DEBUG and corresponding #endif statements around debug code. For example: #ifdef DEBUG PRINTF(("Variables Currently Contain: %d, %f, %s\n", *pi, *pf[1], str)); #endif You can specify a DEBUG define at compile time by issuing gcc with the - DDEBUG command option. Note: This can be even further simplified into a single command called DPRINTF, so you don't even have to write the #ifdef #endif directives! How? Look at the Programming Tips and Tricks section (Quick Debugging Statements). Debugging Techniques :: GNU gdb gdb is a powerful program in tracking down Segmentation Faults and Core Dumps. It can be used for a variety of debugging purposes though. First thing you must do is compile with the -g option and without any optimization (i.e. no -O2 flag). Once you do that, you can run gdb <exe>. where <exe> is the name of the executable. gdb should load with the executable to run on. Now you can create breakpoints where you want the the execution to stop. This can be specified with the line number in the corresponding c source file. For example: break 376 would instruct gdb to stop at line 376. You can now run the program by issuing the run command. If your program requires command-line options or parameters, you can specify them with the run command. For example:run 4 -s Doc! where 4, -s, Doc! are the parameters. The program should run until the breakpoint or exit on a failure. If it fails before the breakpoint you need to re-examine where you should specify the break. Repeat the breakpoint step and rerun. If your program stops and shows you the breakpoint line, then you can step into the function. To step into the function use the step command. NOTE: Do not step into system library calls (e.g. printf). You can use the command next over these types of calls or over local function calls you don't wish to step into. You can repeat the last command by simply pressing enter. You can use the continue command to tell gdb to continue executing until the next breakpoint or it finishes the program. If you want to peek at variables, you can issue the print command on the variable. For example: print mystruct->data. You can also set variables using the set command. For example: set mystruct- >data = 42. The ptype command can tell you what type a particular variable is. The commands instruction tells gdb to set a particular number of commands and to report them to you. For example, commands 1 will allow you to enter in a variable number of other commands (one per line, end it with "end"), and will report those commands to you once breakpoint 1 is hit. The clear command tells gdb to clear a specified breakpoint. The list command can tell you where you are at in the particular code block. You can specify breakpoints not only with lines but with function names. For more information on other commands, you can issue the help command inside gdb. Debugging Techniques :: dbx dbx is a multi-threaded program debugger. This program is great for tracking down memory leaks. dbx is not found on linux machines (it can be found on Solaris or other *NIX machines). Run dbx with the executable like gdb. Now you can set arguments with runargs. After doing that, issue the check -memuse command. This will check for memory use. If you want to also check for access violations, you can use the check - all command. Run the program using the run command. If you get any access violations or memory leaks, dbx will report them to you. Run the help command if you need to understand other commands or similar gdb commands. Debugging Techniques :: Other Debuggers With Gnome 1.4, there is a program called MemProf. It is a memory profiler that can detect leaks. Although I have not personally used it, it could be a great graphical tool to use in finding those nasty memory leaks! strace is another program that can trace the program. Although the output is much harder to parse than the other programs, this can be very useful in tracking down problems with your code. Creating Libraries If you have a bunch of files that contain just functions, you can turn these source files into libraries that can be used statically or dynamically by programs. This is good for program modularity, and code re-use. Write Once, Use Many. A library is basically just an archive of object files. Creating Libraries :: Static Library Setup First thing you must do is create your C source files containing any functions that will be used. Your library can contain multiple object files. After creating the C source files, compile the files into object files. To create a library: ar rc libmylib.a objfile1.o objfile2.o objfile3.o This will create a static library called libname.a. Rename the "mylib" portion of the library to whatever you want. Next: ranlib libmylib.a This creates an index inside the library. That should be it! If you plan on copying the library, remember to use the -p option with cp to preserve permissions. Creating Libraries :: Static Library Usage Remember to prototype your library function calls so that you do not get implicit declaration errors. When linking your program to the libraries, make sure you specify where the library can be found: gcc -o foo -L. -lmylib foo.o The -L. piece tells gcc to look in the current directory in addition to the other library directories for finding libmylib.a. You can easily integrate this into your Makefile (even the Static Library Setup part)! Creating Libraries :: Shared Library Setup Creating shared or dynamic libraries is simple also. Using the previous example, to create a shared library: gcc -fPIC -c objfile1.c gcc -fPIC -c objfile2.c gcc -fPIC -c objfile3.c gcc -shared -o libmylib.so objfile1.o objfile2.o objfile3.o The -fPIC option is to tell the compiler to create Position Independent Code (create libraries using relative addresses rather than absolute addresses because these libraries can be loaded multiple times). The -shared option is to specify that an architecture-dependent shared library is being created. However, not all platforms support this flag. Now we have to compile the actual program using the libraries: gcc -o foo -L. -lmylib foo.o Notice it is exactly the same as creating a static library. Although, it is compiled in the same way, none of the actual library code is inserted into the executable, hence the dynamic/shared library. Note: You can automate this process using Makefiles! Creating Libraries :: Shared Library Usage Since programs that use static libraries already have the library code compiled into the program, it can run on its own. Shared libraries dynamically access libraries at run-time thus the program needs to know where the shared library is stored. What's the advantage of creating executables using Dynamic Libraries? The executable is much smaller than with static libraries. If it is a standard library that can be installed, there is no need to compile it into the executable at compile time! The key to making your program work with dynamic libraries is through the LD_LIBRARY_PATH enviornment variable. To display this variable, at a shell: echo $LD_LIBRARY_PATH Will display this variable if it is already defined. If it isn't, you can create a wrapper script for your program to set this variable at run-time. Depending on your shell, simply usesetenv (tcsh, csh) or export (bash, sh, etc) commands. If you already have LD_LIBRARY_PATH defined, make sure you append to the variable, not overwrite it! For example: setenv LD_LIBRARY_PATH /path/to/library:${LD_LIBRARY_PATH} would be the command you would use if you had tcsh/csh and already had an existing LD_LIBRARY_PATH. If you didn't have it already defined, just remove everything right of the :. An example with bash shells: export LD_LIBRARY_PATH=/path/to/library:${LD_LIBRARY_PATH} Again, remove the stuff right of the : and the : itself if you don't already have an existing LD_LIBRARY_PATH. If you have administrative rights to your computer, you can install the particular Programming Tips and Tricks Below is a listing of a few tips you can use when you are programming. This article has been translated to Serbo-Croatian language. Quick Commenting Sometimes you may find yourself trying to comment blocks of code which have comments within them. Because C does not allow nested comments, you may find that the */comment end is prematurely termanating your comment block. You can utilize the C Preprocessor's #if directive to circumvent this: #if 0 /* This code here is the stuff we want commented */ if (a != 0) { b = 0; } #endif Quick Debugging Statements In the C Preprocessor section, we mentioned that you could turn on and off Debugging statements by using a #define. Expanding on that, it is even more convenient if you write a macro (using the PRINTF() macro from the I/O section): #ifdef DEBUG #define DPRINTF(s) PRINTF(s) #else #define DPRINTF(s) #endif Now you can have DPRINTF(("Debugging statement")); for debugging statements! This can be turned on and off using the -DDEBUG gcc flag. Quick man Lookup in vim or emacs In vim, move your cursor over the standard function library call you want to lookup, or any other word that might be in the man pages. Press K (capital k). In emacs, open up your .emacs file and this line: (global-set-key [(f1)] (lambda () (interactive) (manual-entry (current-word)))) Now you can load up emacs put the cursor on the word in question and press the F1 key to load up the man page on it. You can replace the F1 key with anything you wish. library to the /usr/local/lib directory and permanently add an LD_LIBRARY_PATH into your .tcshrc, .cshrc, .bashrc, etc. file.