Names specified here
Name Description Notes Source Availability
__DATE__ Compilation date L M Predefined C89 C90 C95 C99 C11
__TIME__ Compilation time L M Predefined C89 C90 C95 C99 C11

A C program consists of several source files, with names conventionally ending in .c. Each source file undergoes a process of translation, often called compilation (although interpreters also exist), and the results of these processes are linked to produce the executable program. Each translation process is independent of any other translation process, so nothing learned during one process is retained to halp another translation. Headers are used to ensure that information that must be shared across source files is consistent during their separate translation processes.

Source files and headers are composed of characters from the source character set, which includes all of the following:

0 1 2 3 4 5 6 7 8 9
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~

There is also a space character, and some control characters: vertical tab, horizontal tab, and form feed. These characters form the basic source character set, but there may be additional characters, depending on the locale that applies during translation. Finally, the source text is split into lines using an implementation-defined line terminator.

The output of a translation process is usually a file, often with a name related to the source file, and ending with .o or .obj, such that foo.c translates to foo.o, for example. However, this concept of a ‘file’ is internal to the implementation, and need not correspond to a real file.

There are several phases of translation:

  1. In the first phase, the source file is interpreted as a series of characters in the source character set. Line terminators are translated into new-line characters. Each trigraph is interpreted as a single character:

    • ??= for #
    • ??( for [
    • ??) for ]
    • ??< for {
    • ??> for }
    • ??! for |
    • ??' for ^
    • ??- for ~
    • ??/ for \

    In most cases, notwithstanding trigraphs, this phase will be a no-op, as most implementations will want to keep things simple by using the native representation of characters.

    From beyond this phase, each source character is considered a unit, even if it originally was represented by multiple bytes.

  2. Each backslash character \ followed by a new-line character is deleted. This allows long lines to be split for readability, without changing their meaning. This is especially useful in macro definitions, which have no other way of being split.

  3. The source file is parsed as file-tokens. Each comment is replaced by a space sp.

    file-tokens
    file-token
    file-tokens file-token
    file-token
    preprocessing-token
    new-line
    sp
    other white-space characters
    preprocessing-token
    header-name
    only as part of an #include directive
    identifier
    pp-number
    character-constant
    string-literal
    punctuator
    any other character that does not match the other productions
    new-line
    the new-line character
    pp-number
    digit
    . digit
    pp-number digit
    pp-number identifier-nondigit
    pp-number e sign
    pp-number E sign
    pp-number p sign
    pp-number P sign
    pp-number .
    punctuator
    any of [ ] ( ) { } . -> ++ -- & * + - ~ ! / % << >> < > <= >= == != ^ | && || ? : ; ... = *= /= %= += -= <<= >>= &= ^= |= , # ## <: :> <% %> %: %:%:, matching longer sequences first
    sign
    +
    -
    digit
    any of 0 1 2 3 4 5 6 7 8 9
  4. The file-tokens are parsed against preprocessing-file, and are thus scanned in sequence for preprocessing directives, macro invocations and _Pragma expressions. Directives and _Pragma expressions are executed, and macro invocations are expanded recursively. Preprocessing directives are then deleted.

    When an #include directive is executed, its specified file undergoes the first four translation phases, and its resulting file-tokens replace the directive.

  5. Each character constant is converted into a single character in the execution character set. Each string literal is converted into a sequence of characters in the execution character set.

    Implementation-defined behaviour occurs if some source characters cannot be converted to corresponding execution characters.

  6. Adjacent string literals are concatenated into one.

  7. new-line and sp are discarded, leaving only preprocessing-tokens. Each preprocessing-token then becomes a token, and the new sequence of tokens is parsed as a translation-unit.

    token
    keyword
    identifier
    constant
    string-literal
    punctuator

The ordering of these phases is very important. Consider this program and its output:

#include <stdio.h>

char foo1[] = "aa\
nn";
char foo2[] = "aa\\
nn";

main()
{
  printf("foo1: [%s]\n", foo1);
  printf("foo2: [%s]\n", foo2);
  return 0;
}
foo1: [aann]
foo2: [aa
n]

foo1 has a string literal split over too lines using a trailing backslash. foo2 appears to be illegally split, because it ends with a pair of backslahes, which should be translated to a single, literal backslash. However, this doesn't happen, as the second of these backslashes and its following new-line character are translated in phase 2, so the two lines are first joined to give:

char foo2[] = "aa\nn";

And the first backslash and its following n are interpreted in the later phase 5 as a literal new-line character in the execution character set.

__DATE__ and __TIME__ are macros expanding to the date and time respectively of a moment during translation. __DATE__ has the format Mmm dd yyyy, where Mmm is the month name as generated by asctime, dd is the day of the month with a leading space if necessary, and yyyy is the year. __TIME__ has the format hh:mm:ss for hours, minutes and seconds.

See also __FILE__ and __LINE__.


CHaR
Sitemap Supported
Site format updated 2024-06-05T22:37:07.391+0000
Data updated 1970-01-01T00:00:00.000+0000
Page updated 2024-06-10T19:54:01.041+0000