代码之家  ›  专栏  ›  技术社区  ›  hlovdal

用C语言翻译8个阶段的海报

c
  •  17
  • hlovdal  · 技术社区  · 15 年前

    是否有人参考了海报/一页PDF或类似内容,列出了C语言的八个翻译阶段(第一个阶段是三角图翻译)?我想把一张印刷品挂在我电脑旁边的墙上。

    更新:很抱歉忘记指定。我对C90很感兴趣(尽管C99可能非常接近, _Pragma 正如PMG的回答中提到的,C99是特定的,我想避免这种情况)。

    2 回复  |  直到 15 年前
        1
  •  37
  •   Christoph    15 年前

    为胜利而作的ASCII艺术:

                           ANSI C translation phases
                           =========================
    
              +-------------------------------------------------+
              | map physical characters to source character set |
              |     replace line terminators with newlines      |
              |           decode trigraph sequences             |
              +-------------------------------------------------+
                                       |
                                       V
                   +---------------------------------------+
                   | join lines along trailing backslashes |
                   +---------------------------------------+
                                       |
                                       V
         +-------------------------------------------------------------+
         | decompose into preprocessing tokens and whitespace/comments |
         |                      strip comments                         |
         |                      retain newlines                        |
         +-------------------------------------------------------------+        
                                       |
                                       V
              +------------------------------------------------+
              | execute preprocessing directives/invoke macros |
              |              process included files            |
              +------------------------------------------------+
                                       |
                                       V
       +----------------------------------------------------------------+
       | decode escape sequences in character constants/string literals |
       +----------------------------------------------------------------+
                                       |
                                       V
                    +--------------------------------------+
                    | concatenate adjacent string literals |
                    +--------------------------------------+
                                       |
                                       V
                  +------------------------------------------+
                  | convert preprocessing tokens to C tokens |
                  |       analyze and translate tokens       |
                  +------------------------------------------+
                                       |
                                       V
                        +-----------------------------+
                        | resolve external references |
                        |        link libraries       |
                        |      build program image    |
                        +-----------------------------+
    
        2
  •  11
  •   pmg    15 年前

    几乎直接来自 the most current draft of the revised C99 standard 我做了一些重新格式化。
    做一个打印屏幕就可以了。

    5.1.1.2 Translation phases
    
    1 The precedence among the syntax rules of translation is specified by the following
    phases. (*5)
        1. Physical source file multibyte characters are mapped, in an implementation
           defined manner, to the source character set (introducing new-line characters for
           end-of-line indicators) if necessary. Trigraph sequences are replaced by
           corresponding single-character internal representations.
        2. Each instance of a backslash character (\) immediately followed by a new-line
           character is deleted, splicing physical source lines to form logical source lines.
           Only the last backslash on any physical source line shall be eligible for being part
           of such a splice. A source file that is not empty shall end in a new-line character,
           which shall not be immediately preceded by a backslash character before any such
           splicing takes place.
        3. The source file is decomposed into preprocessing tokens (*6) and sequences of
           white-space characters (including comments). A source file shall not end in a
           partial preprocessing token or in a partial comment. Each comment is replaced by
           one space character. New-line characters are retained. Whether each nonempty
           sequence of white-space characters other than new-line is retained or replaced by
           one space character is implementation-defined.
        4. Preprocessing directives are executed, macro invocations are expanded, and
           _Pragma unary operator expressions are executed. If a character sequence that
           matches the syntax of a universal character name is produced by token
           concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
           directive causes the named header or source file to be processed from phase 1
           through phase 4, recursively. All preprocessing directives are then deleted.
        5. Each source character set member and escape sequence in character constants and
           string literals is converted to the corresponding member of the execution character
           set; if there is no corresponding member, it is converted to an implementation-defined
           member other than the null (wide) character. (*7)
        6. Adjacent string literal tokens are concatenated.
        7. White-space characters separating tokens are no longer significant. Each
           preprocessing token is converted into a token. The resulting tokens are
           syntactically and semantically analyzed and translated as a translation unit.
        8. All external object and function references are resolved. Library components are
           linked to satisfy external references to functions and objects not defined in the
           current translation. All such translator output is collected into a program image
           which contains information needed for execution in its execution environment.
    
    (*5) Implementations shall behave as if these separate phases occur, even though many are typically folded
         together in practice. Source files, translation units, and translated translation units need not
         necessarily be stored as files, nor need there be any one-to-one correspondence between these entities
         and any external representation. The description is conceptual only, and does not specify any
         particular implementation.
    (*6) Adjacent string literal tokens are concatenated.
    (*7) White-space characters separating tokens are no longer significant. Each
         preprocessing token is converted into a token. The resulting tokens are
         syntactically and semantically analyzed and translated as a translation unit.