This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 15k traffic Daily!!!

Crafting a Compiler in Rust: Syntactic Analysis


On this submit, I’ll introduce you to the fundamentals of Syntactic Evaluation. Furthermore, we’ll write a primary parser that takes the token stream and converts it into Summary Syntax Bushes.




Replace on the Lexer

Earlier than going into writing the parser, let me inform you about adjustments I’ve made to the Lexer🌟.

  • Now, the Lexer helps two extra literal fixed classes: Floating and Boolean.

  • The literal prefix is now out there. For instance, within the code let quantity = 0i64, the 0i64 has a literal prefix being i64 and a literal worth of 0.

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum LiteralConstantType<'a> {
    Integer {
        worth: &'a str,
        literal_prefix: Possibility<&'a str>,
    },
    Float {
        worth: &'a str,
        literal_prefix: Possibility<&'a str>,
    },
    Boolean(bool),
}

/// Enumeration containing all patterns of token
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum TokenKind<'a> {
    Identifier,
    Area,
    Remark,
    Key phrase(Key phrase),
    LiteralConstant(LiteralConstantType<'a>),
    Punctuator(char),
    EndOfFile,
}
Enter fullscreen mode

Exit fullscreen mode

Full code adjustments to the pernix_lexer module is on the market.




Syntactic Evaluation Fundamentals

As talked about within the introduction, The Syntactic Evaluation continues from the Lexical Evaluation section. However what does it do?

It takes a token stream produced within the earlier section right into a significant information construction referred to as Summary Syntax Tree (AST). Moreover, that is the section the place all of the syntactic errors are caught.



The Summary Syntax Tree

The Summary Syntax Tree is a tree illustration of the code fed to the compiler. It accommodates solely the essential data of the code and extracts away the syntactic element of the language.

Take the next code snippets for instance:

perform add(a, b) {
    return a + b;
}
Enter fullscreen mode

Exit fullscreen mode

def add(a, b):
    return a + b
Enter fullscreen mode

Exit fullscreen mode

There’s JavaScript and Python code above. Each features from each languages behave exactly the identical. If each code snippets are parsed, it’ll produce an AST just like this:

The Abstract Syntax Tree Representation

In JSON illustration

{
  "perform": {
    "parameters": [
      "a",
      "b"
    ],
    "statements": [
      {
        "type": "ReturnStatement",
        "expression": {
          "type": "BinaryOperatorExpression",
          "left": {
            "type": "VariableExpression",
            "variable": "a"
          },
          "operator": "+",
          "right": {
            "type": "VariableExpression",
            "variable": "b"
          }
        }
      }
    ]
  }
}
Enter fullscreen mode

Exit fullscreen mode

The AST produced accommodates all of the essential details about the perform.

  • Its parameter names and depend.
  • The assertion contained in the perform physique; accommodates just one return assertion, which returns an expression of a + b.

Despite the fact that Javascript and Python have completely different syntaxes, the AST produced by each languages right here can be related or equivalent since AST does not think about the language’s syntax element. That is why it is referred to as Summary Syntax Tree.




The Purpose

Earlier than diving into writing the parser, it’s a necessity to scope the function of the parser. What options can be included? What does the code that can be parsed seem like?

Subsequently, I made a reference snippet of code to information us when writing the parser:

utilizing Simmypeet.Math;

namespace Simmypeet {
    namespace Math {
        int32 Add(int32 a, int32 b) {
            return a + b;
        }

        int32 Subtract(int32 a, int32 b) {
            return a + b;
        }
    }

    int32 Fibonacci(int32 n) {
        if (n == 0) {
            return 0;
        }
        else if (n == 1) {
            return 1;
        } 
        else {
            return Add(Fibonacci(n - 1) + Fibonacci(n - 2));
        }
    }
}
Enter fullscreen mode

Exit fullscreen mode

The code above is just like C#, which is😁. It additionally tells us rather a lot about what options we’ve got to implement.

  • If-else assertion
  • Namespace
  • Utilizing declaration (bringing the namespace scope)
  • Operate name
  • Arithmetic operation
  • Comparability
  • Operate declaration

The last word purpose is to implement a parser that accurately parses the above code. Now, it is time to write some code and get our palms soiled!




The pernix_parser Crate

Let’s begin by creating a brand new crate named pernix_parser.

cargo new --lib compiler/pernix_parser
Enter fullscreen mode

Exit fullscreen mode

[workspace]
members = [
    "compiler/pernix_project",
    "compiler/pernix_lexer",
    "compiler/pernix_parser",
]
Enter fullscreen mode

Exit fullscreen mode

Earlier than going into implementing the precise parser, it’s a necessity to put in writing all types of Summary Syntax Tree information buildings first.



The abstract_syntax_tree Module

We’ll create a brand new module referred to as abstract_syntax_tree within the pernix_parser crate. This module holds all information buildings referring to the Summary Syntax Tree.

When producing an AST, it could be helpful to incorporate the supply code place the parser parsed to make an AST. So, within the later section, when errors happen, the error message can encompass the place within the supply code that produces the errors.

Subsequently, in pernix_parser/abstract_syntax_tree.rs, we’ll write a wrapper struct consisting of two fields: the summary syntax tree worth and its place vary.

/// A wrapper round a price that additionally accommodates the place of the worth within the
/// supply code
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct PositiionWrapper<T> {
    pub place: Vary<SourcePosition>,
    pub worth: T,
}
Enter fullscreen mode

Exit fullscreen mode

Subsequent, let’s write an enum that holds all of the values of binary operators.

/// Record of all out there binary operators
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum BinaryOperator {
    Add,
    Subtract,
    Asterisk,
    Slash,
    %,
    LessThan,
    GreaterThan,
    LessThanEqual,
    GreaterThanEqual,
    Equal,
    NotEqual,
    And,
    Or,
}
Enter fullscreen mode

Exit fullscreen mode

Subsequent, write an enum that accommodates all of the values of unary operators.

/// Record of all out there unary operators
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum UnaryOperator {
    Plus,
    Minus,
    LogicalNot,
}
Enter fullscreen mode

Exit fullscreen mode

Subsequent, embody one other enum with all of the methods to confer with a sort.

/// An enumeration containing all types of varieties referenced in this system
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Kind {
    Identifier(String),
    Array {
        element_type: Field<Kind>,
        measurement: usize,
    },
}
Enter fullscreen mode

Exit fullscreen mode

For instance, a code with int32[5], an array of 5 int32s, would produce an Kind enum within the code just like this:

Kind::Array {
    element_type: Field::new(Kind::Identifier("int32".to_string)),
    measurement: 5
}
Enter fullscreen mode

Exit fullscreen mode


Subsequent, let’s create a submodule contained in the abstract_syntax_tree module referred to as expression. This submodule accommodates all of the code associated to Expresison Summary Syntax Tree.

Expression refers to a code in this system that may be evaluated and yields a price. For instance, within the x = 4 + 3 assertion, the 4 + 3 is an expression.

Now, let’s write an enum containing all of the types of an expression.

/// Enumeration containing all doable expressions
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Expression<'a> {
    /// Represents an expression of the shape `left operator proper`
    BinaryExpression {
        left: Field<PositiionWrapper<Expression<'a>>>,
        operator: PositiionWrapper<BinaryOperator>,
        proper: Field<PositiionWrapper<Expression<'a>>>,
    },

    /// Represents an expression of the shape `operator operand`
    UnaryExpression {
        operator: PositiionWrapper<UnaryOperator>,
        operand: Field<PositiionWrapper<Expression<'a>>>,
    },

    /// Represents an expression of the shape `literal`
    LiteralExpression(LiteralConstantType<'a>),

    /// Represents an expression of the shape `identifier`
    IdentifierExpression {
        identifier: PositiionWrapper<String>,
    },

    /// Represents an expression of the shape `function_name(arguments)`
    FunctionCallExpression {
        function_name: PositiionWrapper<String>,
        arguments: Vec<PositiionWrapper<Expression<'a>>>,
    },
}
Enter fullscreen mode

Exit fullscreen mode

  • The BinaryExpression variant consists of two operand expressions separated by one binary operator, corresponding to 4 + 5.
  • The UnaryExpression variant consists of an operand expression and a unary operator, for instance, !true.
  • The LiteralExpression is simple; the expression yielded from the hardcoded worth.
  • The IdentifierExpression refers to a reference to a variable.
  • The FunctionCallExpression refers to a name to a selected perform by identify and by offering arguments.

Subsequent, let’s write all of the variants of the assertion.

/// Enumeration containing all types of statements
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Assertion<'a> {
    /// Represents an announcement of the shape `return expression;`
    ReturnStatement {
        expression: PositiionWrapper<Expression<'a>>,
    },

    /// Represents an announcement of the shape `expression;`
    ExpressionStatement {
        expression: PositiionWrapper<Expression<'a>>,
    },

    /// Represents an announcement of the shape `let identifier = expression;`
    VariableDeclarationStatement {
        identifier: PositiionWrapper<String>,
        expression: PositiionWrapper<Expression<'a>>,
    },

    /// Represents an if assertion of the shape `if (situation) then_statement
    /// else else_statement`
    IfStatement {
        situation: PositiionWrapper<Expression<'a>>,
        then_statement: Vec<PositiionWrapper<Assertion<'a>>>,
        else_statement: Possibility<Vec<PositiionWrapper<Assertion<'a>>>>,
    },
}
Enter fullscreen mode

Exit fullscreen mode

Assertion is a fraction of code that carries out a selected motion in this system.

  • The ReturnStatement variant represents an announcement that returns a price from a perform.
  • The ExpressionStatement variant represents an announcement that evaluates an expression.
  • The VariableDeclarationStatement variant represents an announcement that declares a brand new variable.
  • The IfStatement variant represents an announcement that executes a block of code if a situation is true. It may well additionally execute a distinct block of code if the situation is fake.

Subsequent, let’s create a brand new submodule contained in the abstract_syntax_tree module referred to as declaration. This module holds all of the AST code associated to all of the declarations that you’d outline in a program, corresponding to namespace declarations, features, and courses.

/// A declaration is an announcement that declares a brand new identify within the present scope.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Declaration<'a> {
    /// Represents a namespace declaration of the shape `namespace namespace_name { declarations* }`
    NamespaceDeclaration {
        namespace_name: PositiionWrapper<String>,
        declarations: Vec<PositiionWrapper<Declaration<'a>>>,
    },

    /// Represents a perform declaration of the shape `kind function_name(parameters) { statements* }`
    FunctionDeclaration {
        function_name: PositiionWrapper<String>,
        parameters: Vec<PositiionWrapper<String>>,
        return_type: PositiionWrapper<Kind>,
        physique: Vec<PositiionWrapper<Assertion<'a>>>,
    },

    /// Represents a namespace utilizing assertion of the shape `utilizing namespace_name;`
    UsingStatement{
        namespace_name: PositiionWrapper<String>,
    },
}
Enter fullscreen mode

Exit fullscreen mode

  • The NamespaceDeclaration variant represents a declaration that declares a brand new namespace and accommodates different declarations inside it.
  • The FunctionDeclaration variant represents a declaration that declares a brand new perform and accommodates an inventory of statements inside it.
  • The UsingDeclaration variant represents a declaration that imports a namespace into the present scope.



The error Module

Now, it is time to implement an error module. This module will include all of the error varieties that we’ll use within the parser.

/// Enumeration containing all doable contexts that the error can happen in.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Context {
    Program,
    Assertion,
    Expression,
    Namespace,
}

/// Enumeration containing all doable errors that may happen throughout parsing.
#[derive(Debug, Clone)]
pub enum Error<'a> {
    LexicalError(pernix_lexer::error::Error<'a>),
    KeywordExpected {
        expected_keyword: Key phrase,
        found_token: Token<'a>,
        source_reference: &'a SourceCode,
    },
    IdentifierExpected {
        found_token: Token<'a>,
        source_reference: &'a SourceCode,
    },
    PunctuatorExpected {
        expected_punctuator: char,
        found_token: Token<'a>,
        source_reference: &'a SourceCode,
    },
    UnexpectedToken {
        context: Context,
        found_token: Token<'a>,
        source_reference: &'a SourceCode,
    },
}
Enter fullscreen mode

Exit fullscreen mode

  • The Context enumeration accommodates all of the doable contexts through which the error can happen. For instance, if the error happens within the Program context, it signifies that the error occurred within the high stage of this system.

  • The Error enumeration accommodates all of the doable errors that may happen throughout parsing. The LexicalError variant is used to wrap the error that occurred in the course of the lexical evaluation. The opposite variants are used to symbolize the errors that occurred in the course of the parsing.




Writing The Parser

The parser is a state machine that behaves equally to the lexer. It turns the token stream right into a significant AST information construction. The parser that we’ll write is a mix of Recursive Descent Parsing and Operator-Precedence Parsing.

/// Represents a struct that parases the given token stream into summary syntax
/// bushes
pub struct Parser<'a> {
    // The lexer that's used to generate the token stream
    lexer: Lexer<'a>,
    // the accrued tokens which have been tokenized by the lexer thus far
    accumulated_tokens: Vec<Token<'a>>,
    // the accrued errors which have been discovered throughout parsing thus far
    accumulated_errors: Vec<Error<'a>>,
    // The present place within the supply code
    current_position: usize,
    // Flag that signifies whether or not the parser ought to produce errors into the
    // listing or not
    produce_errors: bool,
}
Enter fullscreen mode

Exit fullscreen mode

Within the following sections, I am going to clarify the fields of the Parser struct.

  • The lexer subject is used to generate the token stream. It is a struct that
    we have already applied within the earlier submit.
  • The accumulated_tokens subject is used to retailer the tokens which have been
    tokenized by the lexer thus far. The parser will use this subject to get the token
    already tokenized by the lexer.
  • The accumulated_errors subject shops the errors
    discovered throughout parsing.
  • The current_position subject shops the present token stream index.
  • The produce_errors subject signifies whether or not the parser ought to
    produce errors into the listing or not. This subject prevents the parser
    from producing errors when recovering from an error.

Let’s create a new constructor for the Parser struct.

impl<'a> Parser<'a> {
    /// Creates a brand new parser occasion from the given supply code
    pub fn new(source_code: &'a SourceCode) -> Self {
        let mut lexer = Lexer::new(source_code);
        let mut accumulated_errors = Vec::new();

        // get the primary token
        let accumulated_tokens = vec![{
            let token_return;
            loop {
                match lexer.lex() {
                    Ok(token) => {
                        token_return = token;
                        break;
                    }
                    Err(err) => {
                        accumulated_errors.push(Error::LexicalError(err));
                    }
                }
            }
            token_return
        }];

        Self {
            lexer,
            accumulated_tokens,
            accumulated_errors,
            current_position: 0,
            produce_errors: true,
        }
    }
}
Enter fullscreen mode

Exit fullscreen mode

The perform creates a brand new Lexer occasion from the supply code. First, it will get the primary token from the lexer and shops it within the accumulated_tokens subject. Then, it creates a brand new Parser occasion and returns it.

Now, let’s implement the subsequent perform. This perform will return the present token and transfer the current_position subject to the following token. If the current_position subject is the same as the size of the accumulated_tokens subject, it signifies that the parser has reached the tip of the token stream. On this case, the perform will get the following token from the lexer and retailer it within the accumulated_tokens subject. Furthermore, if the lexer encounters an error, the perform will retailer the error within the accumulated_errors subject.

impl<'a> Parser<'a> {
    // Will get the present token stream and strikes the present place ahead
    pub(crate) fn subsequent(&mut self) -> &Token<'a> {
        // must generate extra tokens
        if self.current_position == self.accumulated_tokens.len() - 1 {
            let new_token;
            loop {
                match self.lexer.lex() {
                    Okay(token) => {
                        new_token = token;
                        break;
                    }
                    Err(err) => {
                        self.accumulated_errors.push(Error::LexicalError(err));
                    }
                }
            }

            // append the brand new token to the accrued tokens
            self.accumulated_tokens.push(new_token);
        }
        // panic if the present place is larger than or equal the size
        else if self.current_position >= self.accumulated_tokens.len() {
            panic!("present place is larger than or equal the size of the accrued tokens");
        }

        // increment the present place
        self.current_position += 1;

        &self.accumulated_tokens[self.current_position - 1]
    }
}
Enter fullscreen mode

Exit fullscreen mode

Now, let’s implement the peek perform. This perform will merely index the
accumulated_tokens subject with the current_position subject. It is not going to
transfer the place ahead.

impl<'a> Parser<'a> {
    // Will get the present token with out transferring the present place ahead
    fn peek(&self) -> &Token<'a> {
        &self.accumulated_tokens[self.current_position]
    }

    // Will get the token again by one place with out transferring the present place
    fn peek_back(&self) -> &Token<'a> {
        &self.accumulated_tokens[self.current_position - 1]
    }
}
Enter fullscreen mode

Exit fullscreen mode

Earlier than going into writing the parsing perform, let’s focus on insignificant tokens. Within the earlier submit, we applied the lexer that may tokenize the feedback and whitespaces into tokens. Nonetheless, usually, the syntax does not care about these whitespaces and feedback. For instance, the next code is legitimate within the language.

let x =   5;          // further areas? it is okay!
let x=5;              // no areas? it is okay too!
let x = /*hey?*/ 5; // remark within the center? that is high quality!
Enter fullscreen mode

Exit fullscreen mode

Nonetheless, in some circumstances, it may be essential. For instance, the one distinction between the comparability operators == and two project operators = is the area between them. Within the following code, the primary line will assign the worth of 5 to the variable x; the second line will evaluate the worth of x with 5; the remaining is invalid.

x = 5;            // sure
x == 5;           // yeah
x = = 5;          // ???
x =/*hey?*/= 5; // what?
Enter fullscreen mode

Exit fullscreen mode

Subsequently, we have to implement a perform that may skip insignificant tokens. The perform can be referred to as move_to_significant and applied as follows.

impl<'a> Parser<'a> {
    /// Strikes the present place to the following vital token
    pub(crate) fn move_to_significant(&mut self) {
        whereas !self.peek().is_significant_token() {
            self.subsequent();
        }
    }
}
Enter fullscreen mode

Exit fullscreen mode

And on high of that, implement the perform that strikes to the following vital token, returns it, and strikes the place ahead.

impl<'a> Parser<'a> {
    // Strikes the present place to the following vital token and emits it
    pub(crate) fn next_significant(&mut self) -> &Token<'a> {
        self.move_to_significant();
        self.subsequent()
    }
}
Enter fullscreen mode

Exit fullscreen mode

And if you’re questioning how the is_significant_token is applied, it’s as follows.

impl<'a> Token<'a> {
    /// Returns a boolean indicating whether or not this [`Token`] is a major.
    /// Insignificant tokens are areas and feedback.
    pub fn is_significant_token(&self) -> bool {
        match self.token_kind()  TokenKind::Remark => false,
            _ => true,
        
    }
}
Enter fullscreen mode

Exit fullscreen mode

Subsequent, let’s implement the perform that appends an error to the accumulated_errors subject if the produce_errors subject is true.

impl<'a> Token<'a> {
    // Appends the given error to the listing of accrued errors if the
    // `produce_errors` flag is ready to true. Furthermore, it additionally rolls the parser
    // place again to the `rollback_position`.
    #[must_use]
    pub fn append_error<T>(&mut self, error: Error<'a>) -> Possibility<T> {
        if self.produce_errors {
            self.accumulated_errors.push(error);
        }

        None
    }
}
Enter fullscreen mode

Exit fullscreen mode

Now that we have got the essential features, let’s implement the parsing perform.



Parsing The Certified Title

Let’s begin with one thing easy. Let’s implement the perform that parses the certified identify. The certified identify is the scope’s identify, corresponding to Simmypeet.Compiler.Parser. It follows the sample of an identifier adopted by a dot, adopted by one other identifier, and so forth.

pub fn parse_qualified_name(&mut self) -> Possibility<PositiionWrapper<String>> {
    self.move_to_significant();
    let starting_position = self.peek().position_range().begin;

    // the string to return
    let mut string = String::new();

    // anticipate the primary identifier
    if let TokenKind::Identifier = self.subsequent().token_kind() {
        string.push_str(self.peek_back().lexeme());
    } else {
        return self.append_error(Error::IdentifierExpected {
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    // discovered further scopes
    whereas matches!(self.peek().token_kind(), TokenKind::Punctuator('.')) {
        // eat the dot
        self.subsequent();

        string.push('.');

        // anticipate the following identifier
        if let TokenKind::Identifier = self.subsequent().token_kind() {
            string.push_str(self.peek_back().lexeme());
        } else {
            return self.append_error(Error::IdentifierExpected {
                found_token: self.peek_back().clone(),
                source_reference: self.source_code(),
            });
        }
    }

    Some(PositiionWrapper {
        place: starting_position..self.peek_back().position_range().finish,
        worth: string,
    })
}
Enter fullscreen mode

Exit fullscreen mode

The perform first strikes the place to the following vital token, as generally the parser’s place will not be on the vital token. Then, it information the beginning place of the certified identify. This sample can be utilized in the remainder of the parsing features.

It then expects the primary identifier and shops it within the string variable. If the primary token will not be an identifier, it information an error and returns None.

Lastly, it checks if a dot follows the identifier. Whether it is, it eats the dot, eats extra identifiers, and pushes them to the string variable. It then checks once more if the following token is a dot. Whether it is, it repeats the method. If it isn’t, it returns the string variable.



Parsing The Utilizing Assertion

Subsequent, let’s implement the perform that parses the utilizing assertion. The assertion makes use of the certified identify we have applied within the earlier part. The parser will anticipate the utilizing key phrase, the certified identify, and a semicolon.

/// Parses the present token stream as a using_statement
///
/// Using_Statement:
///   `utilizing` Qualified_Name `;`
pub fn parse_using_statement(
    &mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
    // transfer to the primary vital token
    self.move_to_significant();
    let starting_position = self.peek().position_range().begin;

    // anticipate the `utilizing` key phrase
    if !matches!(
        self.subsequent().token_kind(),
        TokenKind::Key phrase(Key phrase::Utilizing)
    ) {
        return self.append_error(Error::KeywordExpected {
            expected_keyword: Key phrase::Utilizing,
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    let qualified_name = self.parse_qualified_name()?;

    // anticipate the semicolon
    if !matches!(
        self.next_significant().token_kind(),
        TokenKind::Punctuator(';')
    ) {
        return self.append_error(Error::PunctuatorExpected {
            expected_punctuator: ';',
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    Some(PositiionWrapper {
        place: starting_position..self.peek_back().position_range().finish,
        worth: Declaration::UsingStatement {
            namespace_name: qualified_name,
        },
    })
}
Enter fullscreen mode

Exit fullscreen mode



Parsing The Namespace Declaration

This one is a little more difficult. Earlier than we begin, let’s take a look at our error restoration technique. In a naive method, we are able to cease parsing if the parser encounters an error. Nonetheless, this isn’t a good suggestion, because it discards the remainder of the doable legitimate tokens. As an alternative, we are able to use the error restoration technique to proceed parsing the remainder of the tokens after the error.

We’ll use the Panic Mode Error Restoration technique in our circumstances. On this technique, after the parser encounters an error, it’ll search for the following legitimate token and proceed working from there, discarding the invalid tokens. This technique is probably the most used error restoration technique, as it’s easy to implement and efficient.

Let’s put us within the perspective of the parser. Think about the next code:

mod simmypeet {
    use one thing;
    use something_else;

    1234 420 // Oops! it shouldn't be right here

    mod identify {}
}
Enter fullscreen mode

Exit fullscreen mode

As you possibly can see, there’re quantity literals in the midst of the module declaration. When the parser encounters the quantity literals, it’ll file an error saying one thing like “surprising token, quantity literal will not be anticipated right here”. Then, it’ll search for the following legitimate token, which is the mod key phrase. Our parser is not going to merely proceed parsing from the 1234 token then to come across one other 420 token and file one other error. As an alternative, it’ll skip the 1234 and 420 tokens, and proceed parsing from the following mod identify declaration.

Now that we have got the error restoration technique, let’s implement the namespace declaration parsing perform. The parser will anticipate the namespace key phrase, adopted by the certified identifier, adopted by the opening curly brace.

/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
///    `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
    &mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
    // transfer to the primary vital token
    self.move_to_significant();
    let starting_position = self.peek().position_range().begin;

    // anticipate the `namespace` key phrase
    if !matches!(
        self.subsequent().token_kind(),
        TokenKind::Key phrase(Key phrase::Namespace)
    ) {
        return self.append_error(Error::KeywordExpected {
            expected_keyword: Key phrase::Namespace,
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    let namespace_name = self.parse_qualified_name()?;

    // anticipate the opening curly bracket
    if !matches!(
        self.next_significant().token_kind(),
        TokenKind::Punctuator('{')
    ) {
        return self.append_error(Error::PunctuatorExpected {
            expected_punctuator: '{',
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    /* Extra to come back ... */
}
Enter fullscreen mode

Exit fullscreen mode

Subsequent, we’ll instantiate a brand new Vec to retailer the declarations contained in the namespace. Then, we’ll begin a loop that may parse the declarations contained in the namespace. The loop is terminated when the parser encounters the closing curly brace or reaches the tip of the file.

/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
///    `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
    &mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
    /* From the earlier code ... */

    // loop by way of all of the declarations
    let mut declarations = Vec::new();

    // transfer to the following vital token
    self.move_to_significant();

    // loop by way of all of the declarations till closing curly bracket or
    // EOF is discovered
    whereas !matches!(
        self.peek().token_kind(),
        TokenKind::Punctuator('}') | TokenKind::EndOfFile
    ) {
        let declaration = match self.peek().token_kind() {
            TokenKind::Key phrase(Key phrase::Namespace) => {
                self.parse_namespace_declaration()
            }
            TokenKind::Key phrase(Key phrase::Utilizing) => {
                self.parse_using_statement()
            }
            _ => self.append_error(Error::UnexpectedToken {
                context: Context::Namespace,
                found_token: self.peek().clone(),
                source_reference: self.source_code(),
            }),
        };

        if let Some(declaration) = declaration {
            declarations.push(declaration);
        } else token);
        }

        // transfer to the following vital token
        self.move_to_significant();
    }

    /* Extra to come back ... */
}
Enter fullscreen mode

Exit fullscreen mode

You may need seen that we’re utilizing the skip_to technique to skip to the following out there declaration or the closing curly brace. This technique is a helper technique that may bypass the tokens till the predicate returns true. In our case, we’re skipping the tokens till we encounter the namespace, utilizing, or } tokens.

Lastly, we’ll anticipate the closing curly brace and return the namespace declaration.

/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
///    `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
    &mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
    /* From the earlier code ... */

    // anticipate the closing curly bracket
    if !matches!(
        self.next_significant().token_kind(),
        TokenKind::Punctuator('}')
    ) {
        return self.append_error(Error::PunctuatorExpected {
            expected_punctuator: '}',
            found_token: self.peek_back().clone(),
            source_reference: self.source_code(),
        });
    }

    Some(PositiionWrapper {
        place: starting_position..self.peek_back().position_range().finish,
        worth: Declaration::NamespaceDeclaration {
            namespace_name,
            declarations,
        },
    })
}
Enter fullscreen mode

Exit fullscreen mode



Parsing The Program

Lastly, we’ll write the parse_program technique that may parse the whole program. Fortunately, this system is only a listing of declarations, so we’ll simply loop by way of all of the declarations and parse them equally to how we parsed the namespace declaration.

/// Parses all token stream as a program
///
/// Program:
///  Declaration*
pub fn parse_program(&mut self) -> Possibility<Program<'a>> {
    let mut program = Program {
        declarations: Vec::new(),
    };

    // transfer to the following vital token
    self.move_to_significant();

    whereas !matches!(self.peek().token_kind(), TokenKind::EndOfFile) {
        let declaration = match self.peek().token_kind() {
            TokenKind::Key phrase(Key phrase::Namespace) => {
                self.parse_namespace_declaration()
            }
            TokenKind::Key phrase(Key phrase::Utilizing) => {
                self.parse_using_statement()
            }
            _ => self.append_error(Error::UnexpectedToken {
                context: Context::Program,
                found_token: self.peek().clone(),
                source_reference: self.source_code(),
            }),
        };

        if let Some(declaration) = declaration {
            program.declarations.push(declaration);
        } else   TokenKind::Key phrase(Key phrase::Utilizing)
                )
            );
        

        // transfer to the following vital token
        self.move_to_significant();
    }

    Some(program)
}
Enter fullscreen mode

Exit fullscreen mode




Abstract

Phew!!😮‍💨😮‍💨 We have created a easy language parser that may perceive the essential program buildings. Nonetheless, we’re nonetheless going; within the subsequent submit, we’ll proceed writing the parser, which can parse the perform declarations.

As all the time, you could find the supply code for this submit on GitHub

Be at liberty to remark in case you have any ideas, questions, feedbacks, or recommendations. I can be blissful to learn them 😊.

The Article was Inspired from tech community site.
Contact us if this is inspired from your article and we will give you credit for it for serving the community.

This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 10k Tech related traffic daily !!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Want to Contribute to us or want to have 15k+ Audience read your Article ? Or Just want to make a strong Backlink?