Name | Description | Notes | Source | Availability | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
__DATE__ |
Compilation date | L | M | Predefined | C89 | C90 | C95 | C99 | C11 | ||
__TIME__ |
Compilation time | L | M | Predefined | C89 | C90 | C95 | C99 | C11 |
A C program consists of several source files, with names conventionally ending in .c. Each source file undergoes a process of translation, often called compilation (although interpreters also exist), and the results of these processes are linked to produce the executable program. Each translation process is independent of any other translation process, so nothing learned during one process is retained to halp another translation. Headers are used to ensure that information that must be shared across source files is consistent during their separate translation processes.
Source files and headers are composed of characters from the source character set, which includes all of the following:
0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
There is also a space character, and some control characters: vertical tab, horizontal tab, and form feed. These characters form the basic source character set, but there may be additional characters, depending on the locale that applies during translation. Finally, the source text is split into lines using an implementation-defined line terminator.
The output of a translation process is usually a file, often with a name related to the source file, and ending with .o or .obj, such that foo.c translates to foo.o, for example. However, this concept of a ‘file’ is internal to the implementation, and need not correspond to a real file.
There are several phases of translation:
-
In the first phase, the source file is interpreted as a series of characters in the source character set. Line terminators are translated into new-line characters. Each trigraph is interpreted as a single character:
??=
for#
??(
for[
??)
for]
??<
for{
??>
for}
??!
for|
??'
for^
??-
for~
??/
for\
In most cases, notwithstanding trigraphs, this phase will be a no-op, as most implementations will want to keep things simple by using the native representation of characters.
From beyond this phase, each source character is considered a unit, even if it originally was represented by multiple bytes.
-
Each backslash character \ followed by a new-line character is deleted. This allows long lines to be split for readability, without changing their meaning. This is especially useful in macro definitions, which have no other way of being split.
-
The source file is parsed as file-tokens. Each comment is replaced by a space sp.
- file-tokens
file-token
file-tokens file-token
- file-token
preprocessing-token
new-line
-
sp
other white-space characters - preprocessing-token
-
header-name
only as part of an#include
directive identifier
pp-number
character-constant
string-literal
punctuator
-
any other character that does not match the other productions
- new-line
-
the new-line character
- pp-number
digit
. digit
pp-number digit
pp-number identifier-nondigit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .
- punctuator
- sign
+
-
- digit
-
any of
0 1 2 3 4 5 6 7 8 9
-
The file-tokens are parsed against preprocessing-file, and are thus scanned in sequence for preprocessing directives, macro invocations and
_Pragma
expressions. Directives and_Pragma
expressions are executed, and macro invocations are expanded recursively. Preprocessing directives are then deleted.When an
#include
directive is executed, its specified file undergoes the first four translation phases, and its resulting file-tokens replace the directive. -
Each character constant is converted into a single character in the execution character set. Each string literal is converted into a sequence of characters in the execution character set.
Implementation-defined behaviour occurs if some source characters cannot be converted to corresponding execution characters.
-
Adjacent string literals are concatenated into one.
-
new-line and sp are discarded, leaving only preprocessing-tokens. Each preprocessing-token then becomes a token, and the new sequence of tokens is parsed as a translation-unit.
- token
keyword
identifier
constant
string-literal
punctuator
The ordering of these phases is very important. Consider this program and its output:
#include <stdio.h>
char foo1[] = "aa\
nn";
char foo2[] = "aa\\
nn";
main()
{
printf("foo1: [%s]\n", foo1);
printf("foo2: [%s]\n", foo2);
return 0;
}
foo1: [aann] foo2: [aa n]
foo1
has a string literal split
over too lines using a trailing backslash. foo2
appears to be illegally split, because it
ends with a pair of backslahes, which should be translated to
a single, literal backslash. However, this doesn't happen, as
the second of these backslashes and its following new-line
character are translated in phase 2, so the two lines are
first joined to give:
char foo2[] = "aa\nn";
And the first backslash and its following n
are interpreted in the later phase 5 as a
literal new-line character in the execution character
set.
__DATE__
and
__TIME__
are
macros expanding to the date and time respectively of a
moment during translation. __DATE__
has
the format Mmm dd yyyy, where Mmm
is the month name as generated by asctime
, dd is
the day of the month with a leading space if necessary, and
yyyy is the year. __TIME__
has
the format hh:mm:ss for hours, minutes and
seconds.