Вы находитесь на странице: 1из 30

he syntax of the C programming language is a set of rules that specifies whether the sequence of

characters in a file is conforming C source code. The rules specify how the character sequences are to
be chunked into tokens (the lexical grammar), the permissible sequences of these tokens and some of
the meaning to be attributed to these permissible token sequences (additional meaning is assigned by
the semantics of the language).

C syntax makes use of the maximal munch principle.


* 1 Data structures
o 1.1 Primitive data types
+ 1.1.1 Integral types
+ 1.1.2 Enumerated type
+ 1.1.3 Floating point types
+ 1.1.4 Storage duration specifiers
+ 1.1.5 Type qualifiers
o 1.2 Pointers
+ 1.2.1 Referencing
+ 1.2.2 Dereferencing
o 1.3 Arrays
+ 1.3.1 Array definition
+ 1.3.2 Accessing elements
+ 1.3.3 Dynamic arrays
+ 1.3.4 Multidimensional arrays
o 1.4 Strings
+ 1.4.1 Backslash escapes
+ 1.4.2 String literal concatenation
+ 1.4.3 Character constants
+ 1.4.4 Wide character strings
+ 1.4.5 Variable width strings
+ 1.4.6 Library functions
o 1.5 Structures and unions
+ 1.5.1 Structures
+ 1.5.2 Unions
+ 1.5.3 Declaration
+ 1.5.4 Accessing members
+ 1.5.5 Initialization
+ 1.5.6 Assignment
+ 1.5.7 Other operations
+ 1.5.8 Bit fields
+ 1.5.9 Incomplete types
* 2 Operators
* 3 Control structures
o 3.1 Compound statements
o 3.2 Selection statements
o 3.3 Iteration statements
o 3.4 Jump statements
+ 3.4.1 Storing the address of a label
* 4 Functions
o 4.1 Syntax
+ 4.1.1 Function Pointers
o 4.2 Global structure
o 4.3 Argument passing
+ 4.3.1 Array parameters
* 5 Miscellaneous
o 5.1 Reserved keywords
o 5.2 Case sensitivity
o 5.3 Comments
o 5.4 Command-line arguments
o 5.5 Evaluation order
o 5.6 Undefined behavior
* 6 See also
* 7 References
* 8 External links

[edit] Data structures

[edit] Primitive data types

The C language represents numbers in three forms: integral, real and complex. This distinction reflects
similar distinctions in the instruction set architecture of most central processing units. Integral data
types store numbers in the set of integers, while real and complex numbers represent numbers (or pair
of numbers) in the set of real numbers in floating point form.

All C integer types have signed and unsigned variants. If signed or unsigned is not specified explicitly,
in most circumstances signed is assumed. However, for historic reasons plain char is a type distinct
from both signed char and unsigned char. It may be a signed type or an unsigned type, depending on
the compiler and the character set (C guarantees that members of the C basic character set have positive
values). Also, bit field types specified as plain int may be signed or unsigned, depending on the
[edit] Integral types

The integral types come in different sizes, with varying amounts of memory usage and range of
representable numbers. Modifiers are used to designate the size: short, long and long long[1]. The
character type, whose specifier is char, represents the smallest addressable storage unit, which is most
often an 8-bit byte (its size must be at least 7-bit to store the basic character set, or larger) The standard
header limits.h defines the minimum and maximum values of the integral primitive data types, amongst
other limits.

The following table provides a list of the integral types and their common storage sizes. The first listed
number of bits is also the minimum required by ISO C. The last column is the equivalent exact-width
C99 types from the stdint.h header.
Common definitions of integral types Implicit specifier(s) Explicit specifier Number of bits
Unambiguous type
signed char same 8 int8_t
unsigned char same 8 uint8_t
char one of the above 8 int8_t or uint8_t
short signed short int 16 int16_t
unsigned short unsigned short int 16 uint16_t
int signed int 16 or 32 int16_t or int32_t
unsigned unsigned int 16 or 32 uint16_t or uint32_t
long signed long int 32 or 64 int32_t or int64_t
unsigned long unsigned long int 32 or 64 uint32_t or uint64_t
long long[1] signed long long int 64 int64_t
unsigned long long[1] unsigned long long int 64 uint64_t

The size and limits of the plain int type (without the short, long, or long long modifiers) vary much
more than the other integral types among C implementations. The Single UNIX Specification specifies
that the int type must be at least 32 bits, but the ISO C standard only requires 16 bits. Refer to limits.h
for guaranteed constraints on these data types. On most existing implementations, two of the five
integral types have the same bit widths.

Integral type literal constants may be represented in one of two ways, by an integer type number, or by
a single character surrounded by single quotes. Integers may be represented in three bases: decimal (48
or -293), octal with a "0" prefix (0177), or hexadecimal with a "0x" prefix (0x3FE). A character in
single quotes ('F'), called a "character constant," represents the value of that character in the execution
character set (often ASCII). In C, character constants have type int (in C++, they have type char).
[edit] Enumerated type

The enumerated type in C, specified with the enum keyword, and often just called an "enum," is a type
designed to represent values across a series of named constants. Each of the enumerated constants has
type int. Each enum type itself is compatible with char or a signed or unsigned integer type, but each
implementation defines its own rules for choosing a type.

Some compilers warn if an object with enumerated type is assigned a value that is not one of its
constants. However, such an object can be assigned any values in the range of their compatible type,
and enum constants can be used anywhere an integer is expected. For this reason, enum values are
often used in place of the preprocessor #define directives to create a series of named constants.

An enumerated type is declared with the enum specifier, an optional name for the enum, a list of one or
more constants contained within curly braces and separated by commas, and an optional list of variable
names. Subsequent references to a specific enumerated type use the enum keyword and the name of the
enum. By default, the first constant in an enumeration is assigned value zero, and each subsequent
value is incremented by one over the previous constant. Specific values may also be assigned to
constants in the declaration, and any subsequent constants without specific values will be given
incremented values from that point onward.

For example, consider the following declaration:

enum colors { RED, GREEN, BLUE = 5, YELLOW } paint_color;

Which declares the enum colors type; the int constants RED (whose value is zero), GREEN (whose
value is one greater than RED, one), BLUE (whose value is the given value, five), and YELLOW
(whose value is one greater than BLUE, six); and the enum colors variable paint_color. The constants
may be used outside of the context of the enum, and values other than the constants may be assigned to
paint_color, or any other variable of type enum colors.
[edit] Floating point types
The floating-point form is used to represent numbers with a fractional component. They do not
however represent most rational numbers exactly; they are a close approximation instead. There are
three types of real values, denoted by their specifier: single-precision (specifier float), double-precision
(double) and double-extended-precision (long double). Each of these may represent values in a
different form, often one of the IEEE floating point formats.

Floating-point constants may be written in decimal notation, e.g. 1.23. Scientific notation may be used
by adding e or E followed by a decimal exponent, e.g. 1.23e2 (which has the value 123). Either a
decimal point or an exponent is required (otherwise, the number is an integer constant). Hexadecimal
floating-point constants follow similar rules except that they must be prefixed by 0x and use p to
specify a hexadecimal exponent. Both decimal and hexadecimal floating-point constants may be
suffixed by f or F to indicate a constant of type float, by l or L to indicate type long double, or left
unsuffixed for a double constant.

The standard header file float.h defines the minimum and maximum values of the floating-point types
float, double, and long double. It also defines other limits that are relevant to the processing of floating-
point numbers.
[edit] Storage duration specifiers

Every object has a storage class, which may be automatic, static, or allocated. Variables declared within
a block by default have automatic storage, as do those explicitly declared with the auto[2] or register
storage class specifiers. The auto and register specifiers may only be used within functions and function
argument declarations; as such, the auto specifier is always redundant. Objects declared outside of all
blocks and those explicitly declared with the static storage class specifier have static storage duration.

Objects with automatic storage are local to the block in which they were declared and are discarded
when the block is exited. Additionally, objects declared with the register storage class may be given
higher priority by the compiler for access to registers; although they may not actually be stored in
registers, objects with this storage class may not be used with the address-of (&) unary operator.
Objects with static storage persist upon exit from the block in which they were declared. In this way,
the same object can be accessed by a function across multiple calls. Objects with allocated storage
duration are created and destroyed explicitly with malloc, free, and related functions.

The extern storage class specifier indicates that the storage for an object has been defined elsewhere.
When used inside a block, it indicates that the storage has been defined by a declaration outside of that
block. When used outside of all blocks, it indicates that the storage has been defined outside of the file.
The extern storage class specifier is redundant when used on a function declaration. It indicates that the
declared function has been defined outside of the file.
[edit] Type qualifiers

Objects can be qualified to indicate special properties of the data they contain. The const type qualifier
indicates that the value of an object should not change once it has been initialized. Attempting to
modify an object qualified with const yields undefined behavior, so some C implementations store
them in read-only segments of memory. The volatile type qualifier indicates that the value of an object
may be changed externally without any action by the program (see volatile variable); it may be
completely ignored by the compiler.
[edit] Pointers
In declarations the asterisk modifier (*) specifies a pointer type. For example, where the specifier int
would refer to the integer type, the specifier int * refers to the type "pointer to integer". Pointer values
associate two pieces of information: a memory address and a data type. The following line of code
declares a pointer-to-integer variable called ptr:

int *ptr;

[edit] Referencing

When a non-static pointer is declared, it has an unspecified value associated with it. The address
associated with such a pointer must be changed by assignment prior to using it. In the following
example, ptr is set so that it points to the data associated with the variable a:

int *ptr;
int a;

ptr = &a;

In order to accomplish this, the "address-of" operator (unary &) is used. It produces the memory
location of the data object that follows.
[edit] Dereferencing

The pointed-to data can be accessed through a pointer value. In the following example, the integer
variable b is set to the value 10:

int *p;
int a, b;

a = 10;
p = &a;
b = *p;

In order to accomplish that task, the dereference operator (unary *) is used. It returns the data to which
its operandwhich must be of pointer typepoints. Thus, the expression *p denotes the same value as
[edit] Arrays
[edit] Array definition

Arrays are used in C to represent structures of consecutive elements of the same type. The definition of
a (fixed-size) array has the following syntax:

int array[100];

which defines an array named array to hold 100 values of the primitive type int. If declared within a
function, the array dimension may also be a non-constant expression, in which case memory for the
specified number of elements will be allocated. In most contexts in later use, a mention of the variable
array is converted to a pointer to the first item in the array. The sizeof operator is an exception: sizeof
array yields the size of the entire array (that is, 100 times the size of an int). Another exception is the &
(address-of) operator, which yields a pointer to the entire array (e.g. int (*ptr_to_array)[100] =
[edit] Accessing elements

The primary facility for accessing the values of the elements of an array is the array subscript operator.
To access the i-indexed element of array, the syntax would be array[i], which refers to the value stored
in that array element.

Array subscript numbering begins at 0. The largest allowed array subscript is therefore equal to the
number of elements in the array minus 1. To illustrate this, consider an array a declared as having 10
elements; the first element would be a[0] and the last element would be a[9]. C provides no facility for
automatic bounds checking for array usage. Though logically the last subscript in an array of 10
elements would be 9, subscripts 10, 11, and so forth could accidentally be specified, with undefined

Due to arraypointer interchangeability, the addresses of each of the array elements can be expressed
in equivalent pointer arithmetic. The following table illustrates both methods for the existing array:
Array subscripts vs. pointer arithmetic Element index 1 2 3 n
Array subscript array[0] array[1] array[2] array[n-1]
Dereferenced pointer *array *(array + 1) *(array + 2) *(array + n-1)

Similarly, since the expression a[i] is semantically equivalent to *(a+i), which in turn is equivalent to
*(i+a), the expression can also be written as i[a] (although this form is rarely used).
[edit] Dynamic arrays

A constant value is required for the dimension in a declaration of a static array. A desired feature is the
ability to set the length of an array dynamically at run-time instead:

int n = ...;
int a[n];
a[3] = 10;

This behavior can be simulated with the help of the C standard library. The malloc function provides a
simple method for allocating memory. It takes one parameter: the amount of memory to allocate in
bytes. Upon successful allocation, malloc returns a generic (void *) pointer value, pointing to the
beginning of the allocated space. The pointer value returned is converted to an appropriate type
implicitly by assignment. If the allocation could not be completed, malloc returns a null pointer. The
following segment is therefore similar in function to the above desired declaration:

#include <stdlib.h> /* declares malloc */

int *a;
a = malloc(n * sizeof(int));
a[3] = 10;

The result is a "pointer to int" variable (a) that points to the first of n contiguous int objects; due to
arraypointer equivalence this can be used in place of an actual array name, as shown in the last line.
The advantage in using this dynamic allocation is that the amount of memory that is allocated to it can
be limited to what is actually needed at run time, and this can be changed as needed (using the standard
library function realloc).
When the dynamically-allocated memory is no longer needed, it should be released back to the run-
time system. This is done with a call to the free function. It takes a single parameter: a pointer to
previously allocated memory. This is the value that was returned by a previous call to malloc. It is
considered good practice to then set the pointer variable to NULL so that further attempts to access the
memory to which it points will fail. If this is not done, the variable becomes a dangling pointer, and
such errors in the code (or manipulations by an attacker) might be very hard to detect and lead to
obscure and potentially dangerous malfunction caused by memory corruption.

a = NULL;

Standard C-99 also supports variable-length arrays (VLAs) within block scope. Such array variables
are allocated based on the value of an integer value at runtime upon entry to a block, and are
deallocated at the end of the block.

float read_and_process(int sz)

float vals[sz]; // VLA, size determined at runtime

for (int i = 0; i < sz; i++)

vals[i] = read_value();
return process(vals, sz);

[edit] Multidimensional arrays

In addition, C supports arrays of multiple dimensions, which are stored in row-major order.
Technically, C multidimensional arrays are just one-dimensional arrays whose elements are arrays. The
syntax for declaring multidimensional arrays is as follows:

int array2d[ROWS][COLUMNS];

(where ROWS and COLUMNS are constants); this defines a two-dimensional array. Reading the
subscripts from left to right, array2d is an array of length ROWS, each element of which is an array of

To access an integer element in this multidimensional array, one would use


Again, reading from left to right, this accesses the 5th row, 4th element in that row (array2d[4] is an
array, which we are then subscripting with the [3] to access the fourth integer).

Higher-dimensional arrays can be declared in a similar manner.

A multidimensional array should not be confused with an array of references to arrays (also known as
Iliffe vectors or sometimes array of arrays). The former is always rectangular (all subarrays must be the
same size), and occupies a contiguous region of memory. The latter is a one-dimensional array of
pointers, each of which may point to the first element of a subarray in a different place in memory, and
the sub-arrays do not have to be the same size. The latter can be created by multiple use of malloc.
[edit] Strings

In C, string literals (constants) are surrounded by double quotes ("), e.g. "Hello world!" and are
compiled to an array of the specified char values with an additional null terminating character (0-
valued) code to mark the end of the string.

String literals may not contain embedded newlines; this proscription somewhat simplifies parsing of
the language. To include a newline in a string, the backslash escape \n may be used, as below.

There are several standard library functions for operating with string data (not necessarily constant)
organized as array of char using this null-terminated format; see below.

C's string-literal syntax has been very influential, and has made its way into many other languages,
such as C++, Perl, Python, PHP, Java, Javascript, C#, Ruby. Nowadays, almost all new languages adopt
or build upon C-style string syntax. Languages that lack this syntax tend to precede C.
[edit] Backslash escapes

If you wish to include a double quote inside the string, that can be done by escaping it with a backslash
(\), for example, "This string contains \"double quotes\".". To insert a literal backslash, one must double
it, e.g. "A backslash looks like this: \\".

Backslashes may be used to enter control characters, etc., into a string:

Escape Meaning
\\ Literal backslash
\" Double quote
\' Single quote
\n Newline (line feed)
\r Carriage return
\b Backspace
\t Horizontal tab
\f Form feed
\a Alert (bell)
\v Vertical tab
\? Question mark (used to escape trigraphs)
\nnn Character with octal value nnn
\xhh Character with hexadecimal value hh

The use of other backslash escapes is not defined by the C standard, although compiler vendors often
provide additional escape codes as language extensions.
[edit] String literal concatenation

Adjacent string literals are concatenated at compile time; this allows long strings to be split over
multiple lines, and also allows string literals resulting from C preprocessor defines and macros to be
appended to strings at compile time:

printf(__FILE__ ": %d: Hello "

"world\n", __LINE__);
will expand to

printf("helloworld.c" ": %d: Hello "

"world\n", 10);

which is syntactically equivalent to

printf("helloworld.c: %d: Hello world\n", 10);

[edit] Character constants

Individual character constants are represented by single-quotes, e.g. 'A', and have type int (in C++
char). The difference is that "A" represents a pointer to the first element of a null-terminated array,
whereas 'A' directly represents the code value (65 if ASCII is used). The same backslash-escapes are
supported as for strings, except that (of course) " can validly be used as a character without being
escaped, whereas ' must now be escaped. A character constant cannot be empty (i.e. '' is invalid syntax),
although a string may be (it still has the null terminating character). Multi-character constants (e.g. 'xy')
are valid, although rarely useful they let one store several characters in an integer (e.g. 4 ASCII
characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are
packed into one int is not specified, portable use of multi-character constants is difficult.
[edit] Wide character strings

Since type char is usually 1 byte wide, a single char value typically can represent at most 255 distinct
character codes, not nearly enough for all the characters in use worldwide. To provide better support for
international characters, the first C standard (C89) introduced wide characters (encoded in type
wchar_t) and wide character strings, which are written as L"Hello world!"

Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as UTF-16) or 4
bytes (usually UTF-32), but Standard C does not specify the width for wchar_t, leaving the choice to
the implementor. Microsoft Windows generally uses UTF-16, thus the above string would be 26 bytes
long for a Microsoft compiler; the Unix world prefers UTF-32, thus compilers such as GCC would
generate a 52-byte string. A 2-byte wide wchar_t suffers the same limitation as char, in that certain
characters (those outside the BMP) cannot be represented in a single wchar_t; but must be represented
using surrogate pairs.

The original C standard specified only minimal functions for operating with wide character strings; in
1995 the standard was modified to include much more extensive support, comparable to that for char
strings. The relevant functions are mostly named after their char equivalents, with the addition of a "w"
or the replacement of "str" with "wcs"; they are specified in <wchar.h>, with <wctype.h> containing
wide-character classification and mapping functions.
[edit] Variable width strings

A common alternative to wchar_t is to use a variable-width encoding, whereby a logical character may
extend over multiple positions of the string. Variable-width strings may be encoded into literals
verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g. "\xc3\xa9"
for "" in UTF-8). The UTF-8 encoding was specifically designed (under Plan 9) for compatibility with
the standard library string functions; supporting features of the encoding include a lack of embedded
nulls, no valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these
features are likely to prove incompatible with the standard library functions; encoding-aware string
functions are often used in such case.
[edit] Library functions

Strings, both constant and variable, may be manipulated without using the standard library. However,
the library contains many useful functions for working with null-terminated strings. It is the
programmer's responsibility to ensure that enough storage has been allocated to hold the resulting

The most commonly used string functions are:

* strcat(dest, source) - appends the string source to the end of string dest
* strchr(s, c) - finds the first instance of character c in string s and returns a pointer to it or a null
pointer if c is not found
* strcmp(a, b) - compares strings a and b (lexicographical ordering); returns negative if a is less than
b, 0 if equal, positive if greater.
* strcpy(dest, source) - copies the string source onto the string dest
* strlen(st) - return the length of string st
* strncat(dest, source, n) - appends a maximum of n characters from the string source to the end of
string dest and null terminates the string at the end of input or at index n+1 when the max length is
* strncmp(a, b, n) - compares a maximum of n characters from strings a and b (lexical ordering);
returns negative if a is less than b, 0 if equal, positive if greater
* strrchr(s, c) - finds the last instance of character c in string s and returns a pointer to it or a null
pointer if c is not found

Other standard string functions include:

* strcoll(s1, s2) - compare two strings according to a locale-specific collating sequence

* strcspn(s1, s2) - returns the index of the first character in s1 that matches any character in s2
* strerror(errno) - returns a string with an error message corresponding to the code in errno
* strncpy(dest, source, n) - copies n characters from the string source onto the string dest,
substituting null bytes once past the end of source; does not null terminate if max length is reached
* strpbrk(s1, s2) - returns a pointer to the first character in s1 that matches any character in s2 or a
null pointer if not found
* strspn(s1, s2) - returns the index of the first character in s1 that matches no character in s2
* strstr(st, subst) - returns a pointer to the first occurrence of the string subst in st or a null pointer if
no such substring exists
* strtok(s1, s2) - returns a pointer to a token within s1 delimited by the characters in s2
* strxfrm(s1, s2, n) - transforms s2 onto s1, such that s1 used with strcmp gives the same results as
s2 used with strcoll

There is a similar set of functions for handling wide character strings.

[edit] Structures and unions
[edit] Structures

Structures in C are defined as data containers consisting of a sequence of named members of various
types. They are similar to records in other programming languages. The members of a structure are
stored in consecutive locations in memory, although the compiler is allowed to insert padding between
or after members (but not before the first member) for efficiency. The size of a structure is equal to the
sum of the sizes of its members, plus the size of the padding.
[edit] Unions

Unions in C are related to structures and are defined as objects that may hold (at different times)
objects of different types and sizes. They are analogous to variant records in other programming
languages. Unlike structures, the components of a union all refer to the same location in memory. In
this way, a unio

abram n can be used at various times to hold

different types of objects, without the need to create a separate object for each new type. The size of a
union is equal to the size of its largest component type.
[edit] Declaration

Structures are declared with the struct keyword and unions are declared with the union keyword. The
specifier keyword is followed by an optional identifier name, which is used to identify the form of the
structure or union. The identifier is followed by the declaration of the structure or union's body: a list of
member declarations, contained within curly braces, with each declaration terminated by a semicolon.
Finally, the declaration concludes with an optional list of identifier names, which are declared as
instances of the structure or union.

For example, the following statement declares a structure named s that contains three members; it will
also declare an instance of the structure known as t: