Parser

The Parser (c parser.c) performs an LR(1) parse of the input source using parse tables generated by the Parse Table Generator utility and tokens returned by the Scanner. For a description of the LR parsing method used, see references [6] and [7].

Refer to Appendix I for a listing of the grammar which drives the Parser.

Data Structures

Parse Stack — contains parse states and is maintained by the Parser. When a ‘read transition’ occurs, a state is added to the stack. When a reduction occurs, the stack is reduced by the number of tokens on the right hand side of the reduction minus one.

Semantic Stack — is maintained in parallel with the Parse Stack, and is described in detail in section 5 since it is only used by the Semantic Analyzer. The variable, stktop, is used to point to the current top of the semantic stack. Whenever a read transition occurs, the Parser adds to the semantic stack, information on the current token (for example symbol table pointer) which the Scanner passed to it.

Parse Tables — tables created by the utility PRSTAB and which control the parsing process. These tables are described in the PRSTAB Maintenance and Operation Document, [7].

Processing Overview

The Parser uses parse tables generated by the Parse Table Generator utility (PRSTAB), and tokens returned by the Scanner. As each parse reduction is made, the Semantic Analyzer is called to perform the function associated with the particular reduction.

The parser maintains the parse stack. Semantic actions are done in parallel with reductions involving the parse stack. Figure 4-1 gives a pseudo-code description of the operation of the Parser.

loop for each statement in subprogram unit {

    parse_init();

    repeat  /\* once for each input token in the statement \*/   {
     get token;
     while (a reduction can be made using the current
            token as a look-ahead symbol)   {
         call Semantic Analyzer for this reduction;
         if (sem.ignore_stmt) {
             /\* skip this statement \*/
             goto next_statement;
         }
         replace on the parse stack, the right hand side
              symbols by the left hand side symbol for
              this reduction;

         current_state = new value determined from
                             Parse Tables;
         stack current_state on Parse Stack;
     }

     if (there is a valid read transition)  {
         current_state = new value determined using
              current_state, token, and Parse Tables;
         stack current_state information on
              Parse Stack.
     }
     else
         error;

     if (token == end_of_line) {
          if end-of-program-unit return;
          if at end-of-file { error; return; }
          break;
     }
    }
next_statement:;
}

Figure 4-1: Parser Operation

The parse_init routine sets the Parse Stack to its initial state. parse_init calls the reset routine in the lexical analyzer to begin processing tokens from the next statement.

During semantic analysis, the semantic analyzer, while processing reductions, may detect an error for which it’s useless to process the remaining tokens in the statement. For example, if an executable statement appears in an interface, there’s no need to semantically analyze the statement. The semantic analyzer can direct the parser to ignore the remaining portion of the statement and to move on to the next statement; the global variable (a flag) for effecting this is sem.ignore_stmt.

The last token processed by the parser for a statement is the end of line ( TK_EOL ) token. When this token is seen, the parser determines if the END statement is the current statement. If it is, the parser returns to the caller (main) since this represents the end of the current subprogram.

Parse Table Generator (PRSTAB)

Overview

The main function of PRSTAB is to generate data initializations for the Parse Tables used by the Parser. In addition, it generates three files of C #define directives used to coordinate the operation of the Scanner, Parser, and Semantic Analyzer, and two files of static character information used for error messages and debugging. PRSTAB generates a listing and cross reference of the grammar which is used as Appendix I of this document.

Inputs

PRSTAB requires two input files:

  1. Grammar Definition File. This file defines the grammar used by Flang. Reference [6] describes the format of this file; a short example is given here:

    <entry statement> ::=    <entry id>  |
                       <entry id> ( ) |
                       <entry id> ( <id list> )
    
    <entry id> ::=           ENTRY <ident>
    

    If the semantic actions (see section 5) are divided into multiple C source files, this file contains a .BR directive at the points in the grammar where the breaks occur.

  2. Token Definition File. This file contains one line for each terminal symbol of the grammar (token). Each line consists of the token name as it appears in the grammar definition file, followed by the name which will be used for the C constant symbol representing the token id number within Flang. These names must differ in the first 8 characters, and begin with TK\_. Example:

    <ident> TK_IDENT
    CALL    TK_CALL
    +       TK_PLUS
    

Outputs

There are six output files:

  1. Parse Table Definition File. This file contains the C data definitions and initialization code for the Parse Tables. It is included directly into the Parser source module.

  2. Token Id Number Definition File. This file contains a series of #define directives for the token id numbers. The symbol names used are those specified in the Token Definition File.

  3. Semantic Actions Definition File. This file contains a series of #define directives for the semantic action case labels (see section 5). There is one definition for each production of the grammar. The names for these symbols are constructed from the name of the symbol on the left hand side of each production, using an algorithm described in [6].

  4. Non-terminal Number Definition File. This file contains a series of #define directives for the grammar’s non-terminal symbols. The names for these symbols are of the form NT_<name> where name is constructed from the symbol’s name in the grammar in an obvious fashion.

  5. Token Name Definition File. This file contains a statically initialized array indexed by terminal and non-terminal number giving the name (exactly as in the grammar) of each terminal and non-terminal symbol. It is used for error messages and debugging.

  6. Grammar Listing and Cross Reference File. Lists grammar with alphabetic cross reference of grammar symbols at end. Used to create Appendix I of this document.