Вы находитесь на странице: 1из 11

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

Best practices for programming in C


Shiv Dutta, Technical Consultant, IBM, Software Group Gary Hook (ghook@us.ibm.com), Senior Technical Consultant, IBM, Software Group Summary: Although the C language has been around for close to 30 years, its appeal has not yet worn o. It continues to attract a large number of people who must develop new skills for writing new applications, or for porting or maintaining existing applications. Date: 26 Jun 2003 Level: Intermediate Also available in: Russian Activity: 185991 views Comments: 2 (View | Add comment - Sign in) Average rating (301 votes) Rate this article Introduction This article has been written with the needs of the developer in mind. We have put together a set of guidelines that have served us well as developers and consultants over the years, and we oer these as suggestions that may help you in your job. You may not agree with all of them but our hope is you would like some of them and use them in your programming or porting projects. Styles and Guidelines Use a source code style that makes the code readable and consistent. Unless you have a group code style or a style of your own, you could use a style similar to the Kernighan and Ritchie style used by a vast majority of C programmers. Taken to an extreme, however, it's possible to end up with something like this:
int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\ o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);

--Dishonorable mention, Obfuscated C Code Contest, 1984. Author requested anonymity. It is common to see the main routine dened as main(). The ANSI way of writing this is int main(void) (if there are is no interest in the command line arguments) or as int main( int argc, char **argv ). Pre-ANSI compilers would omit the void declaration, or list the variable names and follow with their declarations. Whitespace Use vertical and horizontal whitespace generously. Indentation and spacing should reect the block structure of the code. A long string of conditional operators should be split onto separate lines. For example:
if (foo->next==NULL && number < limit && node_active(this_input)) {... && limit <=SIZE

might be better as:


if (foo->next == NULL && number < limit && limit <= SIZE && node_active(this_input)) { ...

Similarly, elaborate for loops should be split onto dierent lines:


for (curr = *varp, trail = varp;

1 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C


for (curr = *varp, trail = varp; curr != NULL; trail = &(curr->next), curr = curr->next ) { ...

http://www.ibm.com/developerworks/aix/library/au-hook...

Other complex expressions, such as those using the ternary ?: operator, are best split on to several lines, too.
z = (x == y) ? n + f(x) : f(y) - n;

Comments The comments should describe what is happening, how it is being done, what parameters mean, which globals are used and any restrictions or bugs. However, avoid unnecessary comments. If the code is clear, and uses good variable names, it should be able to explain itself well. Since comments are not checked by the compiler, there is no guarantee they are right. Comments that disagree with the code are of negative value. Too many comments clutter code. Here is a superuous comment style:
i=i+1; /* Add one to i */

It's pretty clear that the variable i is being incremented by one. And there are worse ways to do it:

/************************************ * * * Add one to i * * * ************************************/ i=i+1;

Naming Conventions Names with leading and trailing underscores are reserved for system purposes and should not be used for any user-created names. Convention dictates that: 1. #dene constants should be in all CAPS. 2. enum constants are Capitalized or in all CAPS 3. Function, typedef, and variable names, as well as struct, union, and enum tag names should be in lower case. For clarity, avoid names that dier only in case, like foo and Foo . Similarly, avoid foobar and foo_bar. Avoid names that look like each other. On many terminals and printers, 'l', '1' and 'I' look quite similar. A variable named 'l' is particularly bad because it looks so much like the constant '1'. Variable names When choosing a variable name, length is not important but clarity of expression is. A long name can be used for a global variable which is rarely used but an array index used on every line of a loop need not be named any more elaborately than i. Using 'index' or 'elementnumber' instead is not only more to type but also can obscure the details of the computation. With long variable names sometimes it is harder to see what is going on. Consider:

for(i=0 to 100) array[i]=0

2 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

versus

for(elementnumber=0 to 100) array[elementnumber]=0;

Function names Function names should reect what they do and what they return. Functions are used in expressions, often in an if clause, so they need to read appropriately. For example:

if (checksize(x))

is unhelpful because it does not tell us whether checksize returns true on error or non-error; instead:

if (validsize(x))

makes the point clear. Declarations All external data declaration should be preceded by the extern keyword. The "pointer'' qualier, '*', should be with the variable name rather than with the type.

char

*s, *t, *u;

instead of

char*

s, t, u;

The latter statement is not wrong, but is probably not what is desired since 't' and 'u' do not get declared as pointers. Header Files Header les should be functionally organized, that is, declarations for separate subsystems should be in separate header les. Also, declarations that are likely to change when code is ported from one platform to another should be in a separate header le. Avoid private header lenames that are the same as library header lenames. The statement #include "math.h'' includes the standard library math header le if the intended one is not found in the current directory. If this is what you want to happen, comment this fact. Finally, using absolute pathnames for header les is not a good idea. The "include-path'' option of the C compiler (-I (capital "eye") on many systems) is the preferred method for handling extensive private libraries of header les; it permits reorganizing the directory structure without having to alter source les. scanf scanf should never be used in serious applications. Its error detection is inadequate. Look at the example below:

3 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

#include <stdio.h> int main(void) { int i; float f; printf("Enter an integer and a float: "); scanf("%d %f", &i, &f); printf("I read %d and %f\n", i, f); return 0; }

Test run Enter an integer and a oat: 182 52.38 I read 182 and 52.380001 Another TEST run Enter an integer and a oat: 6713247896 4.4 I read -1876686696 and 4.400000 ++ and -When the increment or decrement operator is used on a variable in a statement, that variable should not appear more than once in the statement because order of evaluation is compiler-dependent. Do not write code that assumes an order, or that functions as desired on one machine but does not have a clearly dened behavior:
int i = 0, a[5]; a[i] = i++; /* assign to a[0]? or a[1]? */

Don't let yourself believe you see what isn't there. Look at the following example:

while (c == '\t' || c = ' ' || c == '\n') c = getc(f);

The statement in the while clause appears at rst glance to be valid C. The use of the assignment operator, rather than the comparison operator, results in syntactically incorrect code. The precedence of = is lowest of any operator so it would have to be interpreted this way (parentheses added for clarity):
while ((c == '\t' || c) = (' ' || c == '\n')) c = getc(f);

The clause on the left side of the assignment operator is:


(c == '\t' || c)

which does not result in an lvalue. If c contains the tab character, the result is "true" and no further evaluation is performed, and "true" cannot stand on the left-hand side of an assignment.

4 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

Be clear in your intentions. When you write one thing that could be interpreted for something else, use parentheses or other methods to make sure your intent is clear. This helps you understand what you meant if you ever have to deal with the program at a later date. And it makes things easier if someone else has to maintain the code. It is sometimes possible to code in a way that anticipates likely mistakes. For example, you can put constants on the left of equality comparisons. That is, instead of writing:
while (c == '\t' || c == ' ' || c == '\n') c = getc(f);

You can say:


while ('\t' == c || ' ' == c || '\n' == c) c = getc(f);

This way you will get a compiler diagnostic:


while ('\t' = c || ' ' == c || '\n' == c) c = getc(f);

This style lets the compiler nd problems; the above statement is invalid because it tries to assign a value to '\t'. Trouble from unexpected corners. C implementations generally dier in some aspects from each other. It helps to stick to the parts of the language that are likely to be common to all implementations. By doing that, it will be easier to port your program to a new machine or compiler and less likely that you will run into compiler idiosyncracies. For example, consider the string:
/*/*/2*/**/1

This takes advantage of the "maximal munch" rule. If comments nest, it is interpreted this way:
/* /* /2 */ * */ 1

The two /* symbols match the two */ symbols, so the value of this is 1. If comments do not nest, on some systems, a /* in a comment is ignored. On others a warning is agged for /*. In either case, the expression is interpreted this way:
/* / */ 2 * /* */ 1

2 * 1 evaluates to 2. Flushing Output Buer When an application terminates abnormally, the tail end of its output is often lost. The application may not have the opportunity to completely ush its output buers. Part of the output may still be sitting in memory somewhere and is never written out. On some systems, this output could be several pages long. Losing output this way can be misleading because it may give the impression that the program failed much earlier than it actually did. The way to address this problem is to force the output to be unbuered, especially when debugging. The exact incantation for this varies from system to system but usually looks

5 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C


something like this:
setbuf(stdout, (char *) 0);

http://www.ibm.com/developerworks/aix/library/au-hook...

This must be executed before anything is written to stdout. Ideally this could be the rst statement in the main program. getchar() - macro or function The following program copies its input to its output:

#include

<stdio.h>

int main(void) { register int a; while ((a = getchar()) != EOF) putchar(a); }

Removing the #include statement from the program would cause it to fail to compile because EOF would then be undened. We can rewrite the program in the following way:

#define EOF -1 int main(void) { register int a; while ((a = getchar()) != EOF) putchar(a); }

This will work on many systems but on some it will run much more slowly. Since function calls usually take a long time, getchar is often implemented as a macro. This macro is dened in stdio.h, so when #include <stdio.h> is removed, the compiler does not know what getchar is. On some systems it assumes that getchar is a function that returns an int. In reality, many C implementations have a getchar function in their libraries, partly to safeguard against such lapses. Thus in situations where #include <stdio.h> is missing the compiler uses the function version of getchar. Overhead of function call makes the program slower. The same argument applies to putchar. null pointer A null pointer does not point to any object. Thus it is illegal to use a null pointer for any purpose other than assignment and comparison. Never redene the NULL symbol. The NULL symbol should always have a constant value of zero. A null pointer of any given type will always compare equal to the constant zero, whereas comparison with a variable with value zero or to some non-zero constant has implementation-dened behaviour. Dereferencing a null pointer may cause strange things to happen. What does a+++++b mean? The only meaningful way to parse this is:
a ++ + ++ b

6 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C


a ++ + ++ b

http://www.ibm.com/developerworks/aix/library/au-hook...

However, the maximal munch rule requires it to be broken down as:


a ++ ++ + b

This is syntactically invalid: it is equivalent to:


((a++)++) + b

But the result of a++ is not an lvalue and hence is not acceptable as an operand of ++. Thus the rules for resolving lexical ambiguity make it impossible to resolve this example in a way that is syntactically meaningful. In practice, of course, the prudent thing to do is to avoid construction like this unless you are absolutely certain what they mean. Of course, adding whitespace helps the compiler to understand the intent of the statement, but it is preferable (from a code maintenance perspective) to split this construct into more than one line:
++b; (a++) + b;

Treat functions with care Functions are the most general structuring concept in C. They should be used to implement "top-down" problem solving - namely breaking up a problem into smaller and smaller subproblems until each piece is readily expressed in code. This aids modularity and documentation of programs. Moreover, programs composed of many small functions are easier to debug. Cast all function arguments to the expected type if they are not of that type already, even when you are convinced that this is unnecessary since they may hurt you when you least expect it. In other words, the compiler will often promote and convert data types to conform to the declaration of the function parameters. But doing so manually in the code clearly explains the intent of the programmer, and may ensure correct results if the code is ever ported to another platform. If the header les fail to declare the return types of the library functions, declare them yourself. Surround your declarations with #ifdef/#endif statements in case the code is ever ported to another platform. Function prototypes should be used to make code more robust and to make it run faster. Dangling else Stay away from "dangling else" problem unless you know what you're doing:

if (a == 1) if (b == 2) printf("***\n"); else printf("###\n");

The rule is that an else attaches to the nearest if. When in doubt, or if there is a potential for ambiguity, add curly braces to illuminate the block structure of the code. Array bounds Check the array bounds of all arrays, including strings, since where you type "fubar'' today someone someday may type "occinaucinihilipilication". Robust production software should not use gets(). The fact that C subscripts start from zero makes all kinds of counting problems easier. However, it requires some eort to learn to handle them.

7 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C


some eort to learn to handle them. Null statement

http://www.ibm.com/developerworks/aix/library/au-hook...

The null body of a for or while loop should be alone on a line and commented so that it is clear that the null body is intentional and not missing code.
while (*dest++ = *src++) ; /* VOID */

Test for true or false Do not default the test for non-zero, that is:
if (f() != FAIL)

is better than
if (f())

even though FAIL may have the value 0 which C considers to be false. (Of course, balance this against constructs such as the one shown above in the "Function Names" section.) An explicit test will help you out later when somebody decides that a failure return should be -1 instead of 0. A frequent trouble spot is using the strcmp function to test for string equality, where the result should never be defaulted. The preferred approach is to dene a macro STREQ:
#define STREQ(str1, str2) (strcmp((str1), (str2)) == 0)

Using this, a statement such as:


If ( STREQ( inputstring, somestring ) ) ...

carries with it an implied behavior that is unlikely to change under the covers (folks tend not to rewrite and redene standard library functions like strcmp()). Do not check a boolean value for equality with 1 (TRUE, YES, etc.); instead test for inequality with 0 (FALSE, NO, etc.). Most functions are guaranteed to return 0 if false, but only non-zero if true. Thus,
if (func() == TRUE) {...

is better written
if (func() != FALSE)

Embedded statement There is a time and a place for embedded assignment statements. In some constructs there is no better way to accomplish the results without resulting in bulkier and less readable code:
while ((c = getchar()) != EOF) { process the character }

8 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

Using embedded assignment statements to improve run-time performance is possible. However, you should consider the tradeo between increased speed and decreased maintainability that results when embedded assignments are used in articial places. For example:
x = y + z; d = x + r;

should not be replaced by:


d = (x = y + z) + r;

even though the latter may save one cycle. In the long run the time dierence between the two will decrease as the optimizer is enhanced, while the dierence in ease of maintenance will increase. goto statements goto should be used sparingly. The one place where they can be usefully employed is to break out of several levels of switch, for, and while nesting, although the need to do such a thing may indicate that the inner constructs should be broken out into a separate function.
for (...) { while (...) { ... if (wrong) goto error; } } ... error: print a message

When a goto is necessary the accompanying label should be alone on a line and either tabbed one stop to the left of the code that follows, or set at the beginning of the line. Both the goto statement and target should be commented to their utility and purpose. Fall-though in switch When a block of code has several labels, place the labels on separate lines. This style agrees with the use of vertical whitespace, and makes rearranging the case options a simple task, should that be required. The fall-through feature of the C switch statement must be commented for future maintenance. If you've ever been "bitten" by this feature, you'll appreciate its importance!
switch (expr) { case ABC: case DEF: statement; break; case UVW: statement; case XYZ: statement; break; }

/*FALLTHROUGH*/

While the last break is technically unnecessary, the consistency of its use prevents a fall-through error if another case is later added after the last one. The default case, if used, should always be last and does not require a nal break statement if it is last. Constants

9 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C

http://www.ibm.com/developerworks/aix/library/au-hook...

Symbolic constants make code easier to read. Numerical constants should generally be avoided; use the #dene function of the C preprocessor to give constants meaningful names. Dening the value in one place (preferably a header le) also makes it easier to administer large programs since the constant value can be changed uniformly by changing only the dene. Consider using the enumeration data type as an improved way to declare variables that take on only a discrete set of values. Using enumerations also lets the compiler warn you of any misuse of an enumerated type. At the very least, any directly-coded numerical constant must have a comment explaining the derivation of the value. Constants should be dened consistently with their use; e.g. use 540.0 for a oat instead of 540 with an implicit oat cast. That said, there are some cases where the constants 0 and 1 may appear as themselves instead of as denes. For example if a for loop indexes through an array, then:
for (i = 0; i < arraysub; i++)

is quite reasonable, while the code:


gate_t *front_gate = opens(gate[i], 7); if (front_gate == 0) error("can't open %s\n", gate[i]);

is not. In the second example front_gate is a pointer; when a value is a pointer it should be compared to NULL instead of 0. Even simple values like 1 or 0 are often better expressed using denes like TRUE and FALSE (and sometimes YES and NO read better). Don't use oating-point variables where discrete values are needed. This is due to the inexact representation of oating point numbers (see the second test in scanf, above). Test oating-point numbers using <= or >=; an exact comparison (== or !=) may not detect an "acceptable" equality. Simple character constants should be dened as character literals rather than numbers. Non-text characters are discouraged as non-portable. If non-text characters are necessary, particularly if they are used in strings, they should be written using a escape character of three octal digits rather than one (for example, '\007'). Even so, such usage should be considered machine-dependent and treated as such. Conditional Compilation Conditional compilation is useful for things like machine-dependencies, debugging, and for setting certain options at compile-time. Various controls can easily combine in unforeseen ways. If you use #ifdef for machine dependencies, make sure that when no machine is specied, the result is an error, not a default machine. The #error directive comes in handy for this purpose. And if you use #ifdef for optimizations, the default should be the unoptimized code rather than an uncompilable or incorrect program. Be sure to test the unoptimized code. Miscellaneous Utilities for compiling and linking such as Make simplify considerably the task of moving an application from one environment to another. During development, make recompiles only those modules that have been changed since the last time make was used. Use lint frequently. lint is a C program checker that examines C source les to detect and report type incompatibilities, inconsistencies between function denitions and calls, potential program bugs, etc. Also, investigate the compiler documentation for switches that encourage it to be "picky". The compiler's job is to be precise, so let it report potential problems by using appropriate command line options. Minimize the number of global symbols in the application. One of the benets is the lower probability of conicts with system-dened functions. Many programs fail when their input is missing. All programs should be tested for empty input. This is also likely to help you understand how the program is working Don't assume any more about your users or your implementation than you have to. Things that "cannot happen" sometimes do happen. A robust program will defend against them. If there's a boundary condition to be found, your users will somehow nd it!

10 of 11

Tuesday 22 November 2011 07:30 AM

Best practices for programming in C


to be found, your users will somehow nd it!

http://www.ibm.com/developerworks/aix/library/au-hook...

Never make any assumptions about the size of a given type, especially pointers. When char types are used in expressions most implementations will treat them as unsigned but there are others which treat them as signed. It is advisable to always cast them when used in arithmetic expressions. Do not rely on the initialization of auto variables and of memory returned by malloc. Make your program's purpose and structure clear. Keep in mind that you or someone else will likely be asked to modify your code or make it run on a dierent machine sometime in the future. Craft your code so that it is portable to other machines. Conclusion It is a common knowledge that the maintenance of applications takes a signicant amount of a programmer's time. Part of the reason for this is the use of non-portable and non-standard features and less than desirable programming style when developing applications. In this article we have presented some guidelines which have stood us in good stead over the years. We believe that these guidelines, when followed, will make application maintenance easier in a team environment. Reference Obfuscated C and Other Mysteries by Don Libes, John Wiley and Sons, Inc., ISBN 0-471-57805-3 The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie, Second Edition, Prentice-Hall, ISBN 0-13-110370-9 Safer C by Les Hatton, McGraw-Hill, ISBN 0-07-707640-0 C Traps and Pitfalls by Andrew Koenig, AT&T Bell Laboratories, ISBN 0-201-17928-9

About the authors Shiv Dutta works as a Technical Consultant in the IBM Systems and Technology Group where he assists independent software vendors with the enablement of their applications on pSeries servers. Shiv was one of the co-authors of AIX 5L Dierences Guide Version 5.3 Edition redbook and has considerable experience as a software developer, system administrator, and an instructor. He provides AIX support in the areas of system administration, problem determination, performance tuning, and sizing guides. Shiv has worked with AIX from its inception. He holds a Ph.D. in Physics from Ohio University and can be reached at sdutta@us.ibm.com. Gary R. Hook is a senior technical consultant at IBM, providing application development, porting, and technical assistance to independent software vendors. Mr. Hook's professional experience focuses on Unix-based application development. Upon joining IBM in 1990, he worked with the AIX Technical Support center in Southlake, Texas, providing consulting and technical support services to customers, with an emphasis upon AIX application architecture. Now residing in Austin, Mr. Hook was a member of the AIX Kernel Development team from 1995 through 2000, specializing in the AIX linker, loader, and general application development tools. You can contact him at ghook@us.ibm.com.

11 of 11

Tuesday 22 November 2011 07:30 AM

Вам также может понравиться