C Reference Manual Reading Notes

1. CHARACTER SET
2. Whitespace, Line Termination, and Lines Length Limits
3. Multibyte and Wide Characters
4. Comments
5. Tokens ( Without Constants )
6. Constants
7. C++ Compatibility
8. The C Preprocessor and Preprocessor Commands
- 8.1. The C preprocessor
- 8.2. Preprocessor Commands
9. Preprocessor Lexical Conventions
10. Definition and Replacement

1 CHARACTER SET

A C source file is a sequence of characters selected from a character. C programs are written using the following characters:

1). the 52 Latin capital and small letters: A~Z and a~z 2). the 10 digits: 0~9 3). the space 4). the horizontal tab(HT), vertical tab(VT), form feed(FF) control characters. 5). the 29 graphic character and their official names.

Character Official Names
     !    execlamation mark

     \#    number sign

     %   percent sign

     ^    circumflex accent

     &    ampersand

     *    asterisk

     (     left parenthesis

     _    lowline(underscore)

     )     right parenthesis

     -     hyphen-minus

     +    plus sign

     =    equals sign

     ~    tilde

     [     left square bracket

     ]     right square bracket

     '     apostrhphe

     \|     vertical line

     /     reverse solidus(backslash)

     ;     semicolon

     :     colon

     "     quotation mark

     {     left curly bracket

     }     right curly bracket

     ,      comma

     .      full stop

     <     less-than sign

     >     greater-than sign

     /      solidus(slash, divide sign)

     ?      question mark

Some countries have national character sets that do not include all the graphic character above defined trigraphs and token respelling to allow C programs to be written in the ISO 646-1083 Invariant Code Set.

6). additional characters are sometimes used in C source programes, including formatting characters such as backspace(BS) and carriage return(CR) characters

additional Basic Latin characters, include the character $,@,`(grave accent)
The formatting characters are treated as spaces and do not otherwise affect the source program. The additional graphic characters may appear only in comments, character constants, string constants, and file names.

1.1 Execution Character Set

The character set interpreted during the execution of a C program is not necessarily the same as the one in which the C programe is written.(like as cross compiler tool). Character int the execution character set are represented by their equivalent int the source character set or by special character escape sequences(escape sequence 换码顺序) that begin with the backslash(/) character.

In addition to the standard characters methioned before, the execution character set must also include: 1). a null character that must be encoded as the value 0, which is used to mark the end of strings. 2). a newline character that is used as the end-of-line marker whichi divide character streams into lines during input/output. 3). the alert,backspace,and carriage return characters.

1.2 Whitespace and Line Terminaton

In C source programs the blank(space), end-of-line, VT,FF,HT are known collectively as whitespace characters.(Comments are also whitespace) These characters are ignored except insofar as they are used to separate adjacent tokens.

1.3 Character Encoding

A common C programming error is to aussume a particular encoding is in use when another one holds in fact.

1.4 Trigraphs

A set of trigraphs is included in Standard C so that programs may be written using only thew ISO 646-1083 Invariant Code Set, a subset of the seven-bit ASCII code set and a code set that is common to many non-english national character sets. The trigraphs, introduced by two consecutive question mark characters. listed in follows:

??(            [

??)            ]

??<           {

??>           }

??/            /

??!            |

??'            ^

??-            _

??=           #

1.5 Digraphs

<:           [

:>           ]

<%         {

%>         }

%:          #

%:%:     ##

1.6 Ended with Hello world program

%:include <stdio.h>
int main() <%
  char buf<:??)="Hello world !";
  printf("%s/n", buf);
  return 0;
??>

Hello world !/n

~$ gcc -o hello hello.c -trigraphs
~$ ./hello
Hello world !

2 Whitespace, Line Termination, and Lines Length Limits

The blank(space), end-of-line, VT, FF, HT, and comments are known collectively as whitespace character (WSC). These characters are ignored except insofar as they are used to separate adjacent tokens or when they appera in character constans, string constans, or #include file names. WCS may be used to lay out the C program in a way that is pleasing to a human reader.

The end-of-line character or character sequence marks the end of source program lines. In some emplementations, the formatting characters CR, FF, VT additionally terminate source lines, and are called line break characters. Line Termination is important for the recognition of preprocessor control lines.

A source line can be continued onto the next line by ending the first line with a backslash() or with the trigraph ??. Most C implementation impose a limit on the maximum length of source lines both before and after splicing continuation lines. C89 require to permit logical source lines of at least 509 characters; C99 allows 4095 characters.

3 Multibyte and Wide Characters

To accommodate non-english alphabets that may contain a large number of characters, Standard C introduces wide characters and wide strings. To present wide characters and wide strings in the external, byte-oriented world, the concept of multibyte characters is introduced.

Wide Characters And Strings. A wide character is a binary representation of an element of an extended character set. It has the integer type wchar_t which is declared in header file stddef.h. Standard C does not specify the encodingof the extended character set other than "null wide character"(zero, 0) and the existence of WEOF(-1).

Multibyte Character is the representation of a wide character in either the source or execution character set.(There may be different encoding for each). A multibyte stirng is a normal C string, but whose characters can be interpreted as a series of multibyte characters. The form of multibyte characters and the mapping between multibyte and wide characters is implementation-defined. This mapping is performed for wide-character and wide string constants at compile time, and the standard library provides function that perform this mapping at run time. Multibyte characters encoding can be state dependent or independent.

Standard C places some restrictions on multibyte characters: 1). All characters from the standard character set must be present in the encoding. 2). In the initial shift state, all single-byte characters from the standard character set retain their normal interpretation and do not affect the shift state. 3). A byte containing all zeros is taken to be the null character regardless of shift state. No multibyte character can use a byte containing all zeros as its second or subsequent character.

Together, these rules ensure that multibyte sequences can be processed as normal C strings(e.g. they will not contain embedded null characters ) and a C string without special multibyte codes will have the expected interpretation as a multibyte sequence.

Source and execution use of multibyte characters. Multibyte character may appear in comments, idenrifiers, preprocessor header names, string constants, and character constants. Multibyte characters in the physical representation of the source are recognized and translated to the source character set before any lexical analysis, preprocessing, or even splicing of continuation lines. During process, character appearing in string and character constants are translated to the execution character set before they are interpreted as multibyte sequences.

4 Comments

Standard C supported two stylized comments:

1). Begin with the two characters * and ends with the first subsequent occurence of the two characters *. 2). Begin with the characters // and extends up to(but does not include) the next line break.

All comments may take any number of characters and are always treated as whitespace. And comments are not recognized inside string or character constants or within other comments. The contents of comments are not examined by C implementation except to recognize (and pass over) multibyte characters and line breaks. Comments are removed by the compiler before preprocessing, so preprocessor commands inside comments will not be recognized, and line breaks inside comments do not terminate preprocessor commands. The following two #define commands have the same effect:

#define ten (2*5)
#define ten   /* ten
                       * one greater than nine
                       */  (2*5)

Although some non-standard C implementation implement "nestable comments", please do not depend on it and used it.

To cause the compiler to ignore large parts of C source, it is best to enclose the parts to be removed with the preprocessor commands

#if 0

... ...

#endif

rather than insert * before and * after the text. This avoids having to worry about /*-style comments in the enclosed source text.

5 Tokens ( Without Constants )

Five classes of tokens: operators, separators, identifiers, keywords, and constants.

5.1 operators and separators.

! % ^ & * - + = ~ | . < > / ?

+= -= *= /= %=

<<= >>= &= ^= |=

--> ++ -- << >>

<= >= == != && ||

() [] {} , ; : ...

<% %> <: :> %: %:%: (see the section 001)

5.2 identifiers

An identifier, or name, is a sequence of Latin capital and small letters,

digits, and LOWLINE character. An identifier must bot begin with a digit, and it must not have the same spelling as a keyword.

Beginning with C99, identifiers may also contain unversal character names and

other implementation-defined multibyte characters. Unversal characters must not be used to place a diagit at the beginning of an identifier and further restricted to be "letter-like" character and not puncuators.

Identifiers sensitive of case.
All identifiers may not beginning with an underscore and followed by ethier a

capital(uppercase) letter or another underscore because which all are reserved for standard library.

Internal identidfiers: C89 requires implementation to permit a minimum of 31

significant character in identifiers, and C99 raises this minimum to 63 characters.

External identifiers: C89 requires a minimum capacity of only six characters,

not counting letter case. C99 raises this to 31 characters, including letter case. But allowing unversal character names to be treated as 6 characters or 10 characters.

5.3 Keywords

auto _Bool break case char _Complex const continue default restrict do double
else enum extern float for goto if _Imaginary inline int long register return short
signed sizeof static struct switch typedef union unsigned void volatile while

5.4 Constants

It's more complicated than forwards. The next section will note the constants.

6 Constants

The lexical class of constants includes four different kinds of constants: integers, floating-point numbers, characters, and strings. Suck tokens are called literals in other languages to distinguish thm from objects whose value are constants(i.e., not changing) but that do not belong to lexically distinct classes. An example of these latter objects in C is enumeration constrants, which belong to the lexical class of identifiers. In this book, we use traditional C terminology of constrant for both cases.

6.1 Integer Constants.

Integer constants may be specified in decimal, octual, or hexadecimal notation. There are the rules for determining the radix of an integer constant:

If the integer constant begins with the letters 0X or 0x, then it is in hexadecimal notation, with the character a through f(or A through F) representing 10 through 15.
Otherwise, if it begins with digit 0, then it is in octal notation.
Otherwise, it is in decimal notation.

An integer constant may be immediately followed by suffix letters to designate a minimum size for its type:

Letters l or L indicate a constant of type long
Letters ll or LL indicate a constant of type long long(C99)(Notes, Ll or lL is invalid)
Letters u or U indicate an unsigned type of (int, long, or long long)
The unsigned suffix may be combined with the long or long long suffix in any order, like as 100uLL or 200ULL.

These are valid integer constants: 100, 1000L, 200ll, 0x300, 0777, 0x98FCEuLL.

6.2 Floating-Point Constants

Floating-point constants may be written with a decimal point, a signed exponent, or both. Standard C allows a suffix letter(floating-suffix) to designate constants of type float and long double. Without a suffix, the type of the constant is double. Letters f or F indicate a constant of type of float, and letters l or L indicate a constant of type of long double. In C99, a complex floating-point constant iw written as a floating-point constant expression involving the imaginary constant _Complex_I(or I) defined in complex.h. C99 permits floating-point constants to be expressed in hexadecimal notation; previous versions of C had only decimal floating-point constants. The hexadecimal format use the letter p to separate the fraction from the exponent because the customary e could be confused with a hexadecimal digit. The binary-exponent is a signed decimal number that represents a power of 2(not a power of 10).

These are valid floating-point constants: 0., e21, 3.1415, .01, 1.E-3, 0.52f, 5.5E2L, 0x55fceep-5

6.3 Character Constants

A character constant is written by enclosing one or more characters in apostrophes. A special escape mechanism is providedto write characters or numeric values that would be inconvenient or impossible to enter directly in the source program. Standard C allows the character constant to be preceded by the letter L to specify a wide character constant. The value of a character constant is implementation-defined if:

there is no corresponding character int the execution character set.
more than a single execution character appears in the constant, or
a numeric escape has a value not represented in the execution character set.

Character constants not preceded by the letter L have type int. And wide character constants that designated by the prefix letter L have type wchar_t.

Examples of single-character constants along with their(decimal) values under the ASCII encoding:

'a',  '\r',  ' ',  '\'',  '"',  '\0',  '\377',  '23',  '\\'

6.4 String Constants

A string constant is a (possibly empty) sequence of characters enclosed in double quotes. The same escape mechanism provided for character constants can be used to express the characters in the string. Standard C allows the string constant to be preceded by the letter L to specify a wide string constant.

For each nonwide string constant of n characters, at run time there will be a statically allocated block of n+1 characters whose first n characters are the character from the string and whose last character is the null character, '/0'. This block is the value of the string constant and its type is char[n+1]. Wide sting constants similarly become n wide characters followed by a null wide character and have type wchar-t[n+1].

If a string constant appears anywhere except as an argument to the address operator &, an argument to the sizeof operator, or as an initializer of a character array, then the usual array conversions come into play, changing the string from an array of characters to a pointerto the first character in the string.

6.5 Escape Characters

a -- alert,                b--backspace,               f--form feed,
n--new line,               r--carriage return
t--horizontal tab,    v--vertical tab,              \--backslash,
'--single quote,         "--double quote
?--question mark
numeric escape code: '\004', '\006', '\xab', etc.

7 C++ Compatibility

7.1 Character Sets

The token respelling and trigraphs in Standard C are part of the C++ Standard, but they are not common in pre-Standard C++ implementations. Boths C and C++ allow universal character names with same syntax, but only C explicitly allows other implementation-defined character in identifiers. (One expects that C++ implementations will provide them as an extension.)

7.2 Comments

C99 comments are acceptable as C++ and vice versa. Before C99, the characters // did not introduce a comment in Standard C, and so the sequence of character //* in C could be interpreted differently in C++.

7.3 Operators

There are three new compound operators in C++:

.*    ->*    ::

Since these combinations of tokens would be invalid in Standard C programs, there is no impact on portability from C to C++.

7.4 Identifiers and Keywords

The identiofiers listed in latter are keywords in C++, but not in C. However, the keyword wchar_t is reserved in Standard C, and the keywords bool, true and false are reserved in C99 as part of the standard libraries.

asm               export          private               throw

bool              false           protected             true

catch             friends         public                try

class             mutable         reinterpret_case      typeid

const_cast        namespace       static_cast           typename

delete            new             template              using

dynamic_cast      operator        this                  virtual

explicit          wchar_t

7.5 Character Constants

Single-character constants have type int in C, but have type in C++. Multicharacter constants-which are implementation-defined–have type int in both languages. In practice, this makes little difference since in C++ character constants used in integral contexts are promoted to int under the usual conversions. However, sizeof('C') is sizeof(char) in C++, whereas it is sizeof(int) in C.

8 The C Preprocessor and Preprocessor Commands

8.1 The C preprocessor

The C preprocessor is a simple macroprocessor that conceptually processes the source text of a C program before the compiler proper reads the source program. In some implementations of C(like as gcc), the preprocessor is actually a separate program that reads the original source file and writes out a new "preprocessored" source file that can then be used as input to the C compiler. In other implementations, a single program performs the preprocessing and compilation in a single pass over the source file.

8.2 Preprocessor Commands

The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character #. Lines that do not contain preprocessor commands are called lines of source program text. The preprocessor commands as follows:

#define      Define a preprocessor macro

#undef       Remove a preprocessor macro definition

#include     Insert text from another source file

#if          Conditionally include some text based on the value of a constant expression

#ifdef       Conditionally include some text based on whether a macro name is defined

#ifndef      Conditionally include some text with the sense of the test opposite to that of #ifdef

#else        Alternatively include some text if the previous #if,#fdef,#ifndef or #elif test failed

#endif       Terminate conditional test

#line        Supply a line number for compiler messages

#elif        Alternatively include some text based on the value of another constant expression if the previous #if,

             #ifdef,#ifndef, or #elif test failed

defined      Preprocessor function that yields 1 if a name is defined as a preprocessor macro and 0 otherwise;

             used in #if and #elif

# operator   Replace a macro parameter with a string constant containing the parameter's value

## operator  Create a single token out of two adjacent tokens

#pragma      Specify implementation-dependment information to the compiler

#error       Produce a compile-time error with a designated message.

The preprocessor typically removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands, such as expanding macro calls that occur within the source program text. The resulting preprocessed source text must then be a valid C program.

The syntax of preprocessor commands is completely independent of (although in some ways similar to)the syntax of the rest of the C language. For example, it is possible for a macro definition to expand into a syntactically incomplete fragment as long as the fragment makes sense(i.e., is preperly completed) in all contexts in which the macro is called.

9 Preprocessor Lexical Conventions

The preprocessor does not parse the source text, but it does break it up into tokens for the purpose of locating macro calls. The lexical conventions of the preprocessor are somewhat different from the compiler proper; the preprocessor recognize the normal C tokens, and additionally recognize as "tokens" other characters that would not be recognized as valid in C proper. This enables the preprocessor to recognize file names, the presence and absence whitespace, and the location of end-of-line markers.

A line beginning with # is treated as a preprocessor command; the name of the command must follow the # character. Standard C permits whitespace to precede and follow the # character ont the same source line, but some older compilers do not. A line whose only non-whitespace character is a # is termed a null directive in standard C and is treated the same as a blank line. Older implementations may behave differently.

The remainder of the line following the command name may contain arguments for the command if appropriate. If a preprocessor command takes no arguments, then the remainder of the command line should be empty except perhaps for whitespace characters or comments. Many pre-ISO compilers silently ignore all characters following the expected arguments(if any); this can lead to portability problems. The arguments to preprocessor commands are generally subject to macro replacement.

Preprocessor lines are recognized before macro expansion. Therefore, if a macro expands into something that looks like a preprocessor command, that command will not be recognized by the preprocessors in Standard C or in most other C compilers.(Some old UNIX implementations violate this rules.) For example, the result of the following code is not to include the file math.h in the program being compiled:

/* This example doesn't work as one might think ! */
#define GETMATH #include <math.h>
GETMATH
// Instead, the expanded token sequences
#include <math.h>
// is merely passed through and compiled as (erroneous) C code

All source lines (including preprocessor command lines) can be continued by preceding the end-of-line markers by a backslash character, /. This happens before scanning for preprocessor commands. Such as the follows:

            The preprocessor

                  #define err(flag,msg) if(flag)/

                                         printf(msg)

             is the same as

                  #define err(flag,msg) if(flag) printf(msg)

             If the backslash character below immediately precedes the end-of-line marker, these two lines

                  #define BACKSLASH /

                  #define ASTERISK *

             will be treated as the single preprocessor command

                  #define BACKSLASH #define ASTERISK *

             and expanded token sequences is

                  #define ASTERISK *

The preprocessor treats comments as whitespace, and line breaks within comments do not terminate preprocessor commands. For example:

                  #define COMMENT/* first line

                                                * second line

                                                */ "comment"

              will be treated as:

                  #define COMMENT  "comment"

10 Definition and Replacement

synopsis

The #define preprocessor command causes a name (identifier) to become defined as a macro to the preprocessor. A sequences of tokens, called the body of the macro, is associated with the name. When the name of the macro is recognized in the program source text or in the arguments of certain other preprocessor commands, it is treated as a call to that macro; the name is effectively replaced by a copy of body. If the macro is defined to accept arguments, then the actual arguments following the macro name are substituted for formal parameters in the macro body.

Example:

If a macro sum with two arguments is defined by

#define sum(x,y) ((x)+(y))

then the preprocessor replaces the source program line

result = sum(5,a*b)

with the simple (and perhaps unintended) text substitution

result = ( (5) + (a*b) );

Since the preprocessor does not distinguish reserved words from other identifiers, it is possible, in principle, to use a C reserved word as the name of a preprocessor macro, but to do so is usually bad programming practice. Macro names are never recognized within comments, string or character constants, or #include file names.

Objectlike Macro Definitions

The #define command has two forms depending on whether a left parenthesis immediately follows the name to be defined. The simpler, objectlike form has no left parenthesis:

#define name sequence-of-tokens(optional)

An objectlike macro takes no arguments. It is invoked merely by mentioning its name. When the name is encountered in the source program text, the name is replaced by the body (the associated sequence-of-tokens, which may be empty). The syntax of the #define command does not require an equal sign or any other special delimiter token after the name being defined. The body starts right after the name.

The objectlike macro is particularly useful for introducing named constant into a program, so that a "magic number" such as the length of a table may be written in exactly one place and then refered to elsewhere by name. This makes it easier to change the number later.

Another important use of objectlike macro is isolate implementation-dependent restrictions on the name of externally defined functions and variables.

Example:

When a C compiler permits long internal identifiers, but the target computer require short external names, the preprocessor may be used to hide these short names:

#define error_handler eh73

extern void error_handler();

and can be used like as:

error_handler(…);

Here are some typical macro definitions:

#define BLOCK_SIZE 0x100

#define TRACK_SIZE (16*BLOCK_SIZE)

A common programming error is to include an extraneous equal sign:

#define NUMBER_DRIVERS = 5 * probably wrong *

This is a valid definition, but it causes the name NUMBER_DRIVERS to be defined as "=5" rather than "5". If one were then to write the code fragment

If( count != NUMBER_DRIVERS ) …

it would be expanded to

if ( count != = 5 ) …

which is syntactically invalid. For similar resons, also be careful to avoid an extraneous semicolon:

#define NUMBER_DRIVERS 5; * probably wrong *

Defining Macros with Parameters

The more complex, functionlike macro definition declares the names of formal parameters within parentheses separated by commas:

#define name( identifier-list(optional) ) sequence-of-tokens(optional)

where identifier-list is a comma-separated list of formal parameter names. In C99, an ellipsis(…; three periods) may also appera after identifier-list to indicate a variable argument list.

The left parenthesis must immediately follow the name of the macro with no intervening whitespace. If whitespace separates the left parenthesis from the macro name, the definition is considered to define a macro that takes no arguments and has a body beginning with a left parenthesis.

The names of the formal parameters must be identifiers, no two the same. There is no requirement that any of the parameter names must be mentioned in the body(although normally they are mentioned). A functionlike macro can have an empty formal parameter list(i.e. zero formal parameters). This kind of macro is useful to simulate a function that takes no arguments.

A functionlike macro takes as many actual parameters as there are formal parameters. The macro is invoked by writing its name, a left parenthesis, then one actual argument token sequence for each formal parameter, then a right parenthesis. The actual argument token sequences are separated by commas. (When a functonlike macro with no formal parameters is invoked, an empty actual argument list must be provided.) When a macro is invoked, whitespace may appear between the macro name and the left parenthesis or in the actual arguments. (Some older and deficient preprocessor implementations do not permit the actual argument token list to extend across multiple lines unless the lines to be continued end with a /.)

A acutal argument token sequence may contain parenthesis if they are properly nested and balanced, and it may contain commas if each comma appears within a set of parentheses. (This restriction prevents confusion with the commas that separate the actual arguments.) Braces and subscripting brackets likewise may appear within macro arguments, but they cannot contain commas and do not have to balance. Parentheses and commas appearing with character-constant and string-constant tokens are not counted in the balancing of parentheses and the delimiting of actual arguments.

In C99, arguments to macro can be empty, that is, consist of no tokens.

Example:

Here is the definition of a macro that multiplies its two arguments:

#define product(x,y) ((x)*(y))

It is invoked twice in the following statement:

x = product(a+3,b) + product(c,d);

The arguments to the product macro could be function(or macro) calls. The commas within the function argument list do not affect the parsing of the macro arguments:

return product( f(g,b), g(a,b) ); * OK *

The getchar() macro has an empty parameter list:

#define getchar() getc(stdin)

When it is invoked, an empty argument list is provided:

while( (c=getchar()) != EOF ) …

(Note: getchar(), stdin, and EOF are defined in the standard header stdio.h.)

We can also define a macro takes as its argument an arbitrary statement:

#define insert(stmt) stmt

The invocation

insert({a=1; b=1;})

works properly, but if we change the two assignment statements to a single statement containing two assignment expressions:

insert({a=1, b=1;})

then the preprocessor will complain that we have too many macro assignments for insert. To fix the problem, we could have to write:

insert( {(a=1, b=1);} )

Definition functionlike macro to be used in statement contexts can be trickly. The following macro swaps the values int its two arguments, x and y, which are assumed to be of a type whose value can be converted to unsigned long and back without change, and to not involve the identifier _temp.

#define swap(x,y) {unsigned long _temp = x; x=y; y=_temp;}

The problem is that it is natural to want to place a semicolon after swap, as you would if swap were really a function:

if ( x > y ) swap (x, y); * whoops*

else x = y;

This will result an error since the expansion includes an extra semicolon. We put the expanded statements on separate lines next to illustrate the problems more clearly:

if ( x > y ) { unsigned long _temp = x; x = y; y = _temp; }

;

else x = y;

A clever way to avoid the problem is to define the macro body as a do-while statement, which consumes the semicolon:

#define swap(x, y ) /

do { unsigned long _temp = x; x = y; y = _temp; }while(0)

When a functionlike macro call is encountered, the entire macro call is replaced, after parameter processing, by a process copy of the body. Parameter processing preoceeds as follows. Actual argument tokens strings are associated with the corresponding formal parmeter names. A copy of the body is then made in which every occurence of a formal parameter name is replaced by a copy of the actual argument token sequence associated with it. This copy the body then replaces the macro call. The entire process of replacing a macro call with the processed copy of itd body is called macro expansion; the processed copy of the body is called the expansion of the macro call.

Example:

Consider this macro definition, which provides a convenient way to make a loop that counts from a given value up to(and including) some limit:

#define incr(v,low,high) /

for( (v) = (low); (v) <= (high); ++(v) )

To print a table of the cubes of the integers from 1 to 20, we could write:

#include <stdio.h>

int main()

{

int j;

incr(j,1,20)

printf("%2d %6d/n",j, j*j*j);

return 0;

}

The call to the macro incr is expanded to produce this loop:

for( (j) = (1); (j) <= (20); ++(j) )

The liberal use of parentheses ensures that complicated acutal arguments are not be misinterpreted by the compiler.

Rescanning of Macro Expressions

Once a macro call has been expanded, the scan for macro calls resumes at the beginning of the expansion so that names of macros may be recognized within the expansion for the purpose of futher macro replacement. Macro replacement is not performed on any part of a #define command, not even the body, at the time the command is processed and the macro name defined. Macro names are recognized within the body only after the body has expanded for some particular macro call.

Macro replacement is also not performed within the actual argument token string of a functionlike macro call at the time the macro call is being scanned. Macro names are recognized within actual argument token strings only during the rescanning of the expansion, assuming that the corresponding formal parameter in fact occurred one or more times within the body(thereby causing the actual argument token string to appear one or more times in the expansion).

Example:

Giving the following definitions:

#define plus(x,y) add(y,x)

#define add(x,y) ((x)+(y))

The invocation

plus(plus(a,b),c)

is expanded as shown next.

Step Result

original plus(plus(a,b),c)
add(c, plus(a,b))
((c)+(plus(a,b)))
((c)+(add(b,a)))
final ((c)+(((b)+(a))))

Macros appearing in their own expansion–either immediately or through some intermediate sequence of nested macro expansions–are not reexpanded in Standard C. This permits a programmer to redefine a function in terms of its old function. Older C preprocessors traditionally do not detect this recursion, and will attempt to continue the expansion until they are stopped by some system error.

Example:

The following macro changes the definition of the square root function to handle negative arguments in a different fashion than is normal:

#define sqrt(x) ( (x) < 0 ? sqrt(-x) : sqrt(x) )

Except that it evaluates its argument more than once, this macro work as intended in Standard C, but might cause an error in older compilers. Similarly:

#define char unsigned char

Predefined Macros

Preprocessors for Standard C are required to define certain objectlike macros. The name of each begins and ends with two underscore characters. None of these predefined may be undefined (#undef) or redefined by the programmer.

The LINE and FILE macros are useful when printing certain kinds of error messages. The DATE and TIME macros can be used to record when a compilation occured. The values of TIME and DATE remain constant throughout the compilation. The values of FILE and LINE macros are established by implementation, but are subject to alteration by the #line directive(like as #line 300 or #line 500 "cppgp.c"). The C99 predefined identifier func is similar in purpose to LINE, but is actually a block-scope variable, not a macro. It supplies the name of the enclosing function.

The STDC and STDC_VERSION macros are useful for writing code compatible with Standard and non-Standard C implementations. The STDC_HOSTED macro was introduced in C99 to distinguish hosted from freestanding implementations. The remaining C99 macros indicate whether the implementation's floating-point and wide character facilities adhere to other relevant international standards(Adherence is recommended, but not required)

Implementation routinely define additional macros to communicate information about the enviroment, such as the type of computer for which the program is being compiled. Exactly which macros are defined is implementation-dependent, although UNIX implementations customarily predefine unix. Unlike the built-in macros, these macros may be undefined. Standard C requires implementation-specific macro names to begin with a leading underscore followed by either an uppercase letter or another underscore.(The macro unix does not meet that criterion.)

And the example about the predefined macros will be appended the next subject.

Undefining and Redefining Macros

The #undef command can be used to make a name be no longer defined:

#undef name

This command causes the preprocessor to forget any macro definition of name. It is not an error to undefine a name currently not defined. Once a name has been undefined, it may then be given a completely new definition(using #define) without error. Macro replacement is not performed within #undef commands.

The benign redefinition of macros is allowed in Standard C and many other implementations. That is, a macro may be redefined if the new definition is the same, token for token, as the existing definition. The redefinition must include whitespace in the same locations as in the original definition, although the particular whitespace characters can be different. We think programmers should avoid depending on benign redefinitions. It is generally better style to have a single point of definition for all program entities, including macros. (Some older implementations of C may not allow any kind of redefinition.)

Example:

In the following definitions, the redefinition of NULL is allowed, but neither redefinition of FUNC is valid. (The first includes whitespace not in the original definition, and the second changes two tokens.)

#define NULL 0

#define FUNC(x) x+4

#define NULL * null pointer * 0

#define FUNC(x) x + 4

#define FUNC(y) y+4

(But I make a test on fedora10 platform with gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC), Both the FUNC redefinition is valid too. why ?)

When the programmer legitimate reasons cannot tell if a previous definition exists, the #ifndef can be used to test for an existing definition so that a redefinition can be avoided.:

#ifndef MAX_TABLE_SIZE

#define MAX_TABLE_SIZE 1000

#endif

Thisidiom is particularly useful with implementations that allow macro definitions in the command that invokes the C compiler. For example, the following UNIX invocation of C provides an initial definition of the macro MAX_TABLE_SIZE as 5000. The C programmer would then check for the definition as shown before:

cc -c -DMAX_TABLE_SIZE=5000 prog.c

Although disallowed in Standard C, a few older preprocessor implementations handle #define and #undef so as to maintain a stack of definitions. When a name is redefined with a #define, its old definition is pushed onto a stack and then the new definition replaces the old one. When a name is undefined with #undef, the current definition is discarded and the most recent previous definition (if any) restored.

Precedence Errors In Macro Expansions

Macros operate purely by textual substitution of tokens. Parsing of the body into declarations, expressions, or statements occurs only after the macro expansion process. This can lead to surprising results if care is not taken. As a rule, it is safest to always parenthesize each parameter appearing in the macro body. The entire body, if it is syntactically an expression, should also be parenthesized.

Example:

Consider this macro definition:

#define SQUARE(x) x*x

The idea is that SQUARE takes an argument expression and produces a new expression to comput the square of that argument. For example, SQUARE(5) expands to %*5. However, the expression SQUARE(z+1) expands to z+1*z+1, which is parsed as z+(1*z)+1 rather than expected (z+1)*(z+1). A definition of SQUARE that avoids this problem is:

#define SQUARE(x) ((x)*(x))

The out parentheses are needed to prevent misinterpretation of an expression such as (short)SQUARE(z+1).

Side Effects In Macro Arguments

Macros can also produce problems dut to side effects. Because the macro's actual arguments may be textually replicated, they may be executed more than once, and side effects in the actual arguments may occur more than once. In contrast, a true function call–which the macro invocation resembles–evaluates argument expressions exactly once, so any side effects of the expression occur exactly once. Macros must be used with care to avoid such problems.

Example:

Consider the macro SQUARE from the prior example and also a function square that does (almost) the same thing:

int square(int x) { return x*x; }

The macro can square integers or floating-point numbers; the function can square only integers. Also, calling the function is likely to be somewhat slower at run time than using the macro. But these differences are less important than the problem of side effects. In the program fragment

a = 3;

b = square(a++);

the variable b gets the value 9 and the variable a ends up with the value 4. Howerver, in the superficially similar program fragment

a = 3;

b = SQUARE(a++);

the variable b may get the value 12 and the variable a may end up with the value 5 because the expansion of the last fragment is

a = 3;

b = ((a++)*(a++));

(Say that 12 and 15 may be the resulting values of b and a because Standard C implementations may evaluate the expression ((a++)*(a++)) in different ways.)

Converting Tokens to Strings

There is a mechanism in Standard C to convert macro parameters (after expansion) to string constants. Before this, programmers had to depend on a loophole in many C preprocessors that achieved the same result in a different way.

In Standard C, the # token appearing within a macro definition is recognized as a unary "stringization" operator that must be followed by the name of a macro formal parameters. During macro expansion, the # and the formal parameter name are replaced by the corresponding actual argument enclosed in string quotes. When creating the string, each sequence of whitespace in the argument's token list is replaced by a single space character, and any embedded quotation or backslash character characters are preceded by a backslash character to preserve their meaning in the string. Whitespace at the beginning and end of the argument is ignored, so an empty argument (even with whitespace between the commas) expands to the empty string "".

Example:

Consider the Standard C definition of macro TEST:

#define TEST(a, b ) printf( #a " < " #b " = %d/n", (a)<(b) )

The statement TEST(0, 0XFFFF); TEST('/n', 10); would expand into

printf("0" "<" "0xFFFF" " = %d/n", (0)<(0XFFFF));

printf(" '//n' " "<" "10" " = %d/n", ('/n') <(10) );

After concatenation of ajacent strings, these become:

printf("0 < 0xFFFF = %d/n", (0) < (0XFFFF) );

printf(" '//n' < 10 = %d/n", ('/n') <(10) );

A number of non-standard C compilers will substitute for macro formal parameters inside string and character constants. Standard C prohibits this.

The handling if whitespace in non-ISO implementations is likely to vary from compiler to compiler–another reason to avoid depending on this feature except in Standard C implementations.

Token Merging In Macro Expansions

Merging of tokens to form new tokens in Standard C is controlled by the presence of a merging operator, ##, in macro definitions. In a macro replacement list–before rescanning for more macros–the two tokens surrounding any ## operator are combined into a single token. There must be suck tokens: ## must not appear at the begnning or end of a replacement list. If the combination does not form a valid token, the result is undefined.

#define TEMP(i) temp ## i

Temp(1) = TEMP(2+k) + x;

After preprocessing, this becomes

temp1 = temp2 + k + x;

In the previous example, a curious situation can arise whe expanding TEMP() + x. The macro definition is valid, but ## is left with no right-hand token token to combine (unless it grabs +, which we do not want). This problem is resolved by treating the formal parameter i as if it expanded to a special "empty" token just for the benefit of ##. Thus, the expansion of TEMP() + x would be temp + x as expected.

Token concatenation must not be used to produce a unversal character name.

As with the conversion of macro arguments to strings, programmers can obtain something like this merging capability through a loophole in many non-Standard C implementations. Although the original definition of C explicitly described macro bodies as being sequences of tokens, not sequences of characters, nevertheless many C compilers expand and rescan macro bodies as if they were character sequences. This becomes apparent primarily in the case where the compiler also handles comments by eliminating them entirely (rather than replacing them with a space)–a situation exploited by some cleverly written programs.

Example:

Consider the following example:

#defi n e INC ++

#define TAB internal_table

#define INCTAB table_of_increments

#define CONC(x,y) x/**/y

CONC(INC,TAB)

Standard C interprets the body of CONC as two tokens, x and y, separated by a space.(Comments are converted to a space.) The call CONC(INC,TAB) expands to the two tokens INC TAB. Howerver, some non-Standard implementations simply eliminate comments and rescan macro bodies for tokens; the expand CONC(INC,TAB) to the single token INCTAB.

Step 1 2 3 4

Standard CONC(INC,TAB) INC/**/TAB INC TAB ++ internal_table

non-Standard CONC(INC,TAB) INC/**/TAB INCTAB table_of_increments

Variable Argument Lists In Macro

In C99, a functionlike macro can have as its last or only formal parameter an ellipsis, signifying that the macro may accept a variable number of arguments:

#define name( identifier-list, … ) sequence-of-tokens(optional)

#define name( … ) sequence-of-tokens(optional)

When such a macro is invoked, there must be at least as many actual arguments as there are identifiers in identifier-list. The trailing argument ( s), including any separating commas, are merged into a single sequence of preprocessing tokens called the variable arguments. The identifier VA_ARGS appearing in the replacement list of the macro definiton as treated as if it had been a macro parameter whose argument was the merged variable arguments. That is, VA_ARGS is replaced by the list of extra arguments, including their comma separators. __VA_ARGS__can only appear in a macro definition that includes … in its parameter list.

Macro with a variable number of arguments are often used to interface to functions that takes a variable number of arguments, such as printf. By using # stringization operator, they can also be used to convert a list of arguments to a single string without having to enclosing the arguments in parentheses.

Example:

These directives create a macro my_printf that can write its arguments either to the error or standard output.

#ifdef DEBUG

#define my_printf( … ) fprintf(stderr, VA_ARGS)

#else

#define my_printf( … ) fprintf(stdout, VA_ARGS)

#endif

Given the definition

#define make_em_a_string( … ) #__VA_ARGS__

the invocation

make_em_a_string(a, b, c, d)

expands to the string

"a, b, c, d"

Other Problems

Some non-Standard implementations do not perform stringent error checking on macro definitions and calls, including permitting an incomplete token in the macro body to be completed by text appearing after the macro call. The lack of error checking by certain implementations does not make clever exploitation of that lack legitimate. Standard C reaffirms that macro bodies must be sequences of well-formed tokens.

Example:

For example, the folloing fragment in one of these non-ISO implementations:

#define STRING_START_PART "This is a split"

…

printf(STRING_START_PART string."); * !!!! Yuk *

will, after preprocessing, result in the source text

printf("This is a split string.");

C Reference Manual Reading Notes

目录

1 CHARACTER SET

1.1 Execution Character Set

1.2 Whitespace and Line Terminaton

1.3 Character Encoding

1.4 Trigraphs

1.5 Digraphs

1.6 Ended with Hello world program

2 Whitespace, Line Termination, and Lines Length Limits

3 Multibyte and Wide Characters

4 Comments

5 Tokens ( Without Constants )

5.1 operators and separators.

5.2 identifiers

5.3 Keywords

5.4 Constants

6 Constants

6.1 Integer Constants.

6.2 Floating-Point Constants

6.3 Character Constants

6.4 String Constants

6.5 Escape Characters

7 C++ Compatibility

7.1 Character Sets

7.2 Comments

7.3 Operators

7.4 Identifiers and Keywords

7.5 Character Constants

8 The C Preprocessor and Preprocessor Commands

8.1 The C preprocessor

8.2 Preprocessor Commands

9 Preprocessor Lexical Conventions

10 Definition and Replacement