On this submit, I’ll introduce you to the fundamentals of Syntactic Evaluation. Furthermore, we’ll write a primary parser that takes the token stream and converts it into Summary Syntax Bushes.
Replace on the Lexer
Earlier than going into writing the parser, let me inform you about adjustments I’ve made to the Lexer🌟.
-
Now, the Lexer helps two extra literal fixed classes:
Floating
andBoolean
. -
The literal prefix is now out there. For instance, within the code
let quantity = 0i64
, the0i64
has a literal prefix beingi64
and a literal worth of0
.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum LiteralConstantType<'a> {
Integer {
worth: &'a str,
literal_prefix: Possibility<&'a str>,
},
Float {
worth: &'a str,
literal_prefix: Possibility<&'a str>,
},
Boolean(bool),
}
/// Enumeration containing all patterns of token
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum TokenKind<'a> {
Identifier,
Area,
Remark,
Key phrase(Key phrase),
LiteralConstant(LiteralConstantType<'a>),
Punctuator(char),
EndOfFile,
}
Full code adjustments to the pernix_lexer
module is on the market.
Syntactic Evaluation Fundamentals
As talked about within the introduction, The Syntactic Evaluation continues from the Lexical Evaluation section. However what does it do?
It takes a token stream produced within the earlier section right into a significant information construction referred to as Summary Syntax Tree (AST). Moreover, that is the section the place all of the syntactic errors are caught.
The Summary Syntax Tree
The Summary Syntax Tree is a tree illustration of the code fed to the compiler. It accommodates solely the essential data of the code and extracts away the syntactic element of the language.
Take the next code snippets for instance:
perform add(a, b) {
return a + b;
}
def add(a, b):
return a + b
There’s JavaScript and Python code above. Each features from each languages behave exactly the identical. If each code snippets are parsed, it’ll produce an AST just like this:
In JSON illustration
{
"perform": {
"parameters": [
"a",
"b"
],
"statements": [
{
"type": "ReturnStatement",
"expression": {
"type": "BinaryOperatorExpression",
"left": {
"type": "VariableExpression",
"variable": "a"
},
"operator": "+",
"right": {
"type": "VariableExpression",
"variable": "b"
}
}
}
]
}
}
The AST produced accommodates all of the essential details about the perform.
- Its parameter names and depend.
- The assertion contained in the perform physique; accommodates just one return assertion, which returns an expression of
a + b
.
Despite the fact that Javascript and Python have completely different syntaxes, the AST produced by each languages right here can be related or equivalent since AST does not think about the language’s syntax element. That is why it is referred to as Summary Syntax Tree.
The Purpose
Earlier than diving into writing the parser, it’s a necessity to scope the function of the parser. What options can be included? What does the code that can be parsed seem like?
Subsequently, I made a reference snippet of code to information us when writing the parser:
utilizing Simmypeet.Math;
namespace Simmypeet {
namespace Math {
int32 Add(int32 a, int32 b) {
return a + b;
}
int32 Subtract(int32 a, int32 b) {
return a + b;
}
}
int32 Fibonacci(int32 n) {
if (n == 0) {
return 0;
}
else if (n == 1) {
return 1;
}
else {
return Add(Fibonacci(n - 1) + Fibonacci(n - 2));
}
}
}
The code above is just like C#, which is😁. It additionally tells us rather a lot about what options we’ve got to implement.
- If-else assertion
- Namespace
- Utilizing declaration (bringing the namespace scope)
- Operate name
- Arithmetic operation
- Comparability
- Operate declaration
The last word purpose is to implement a parser that accurately parses the above code. Now, it is time to write some code and get our palms soiled!
The pernix_parser
Crate
Let’s begin by creating a brand new crate named pernix_parser
.
cargo new --lib compiler/pernix_parser
[workspace]
members = [
"compiler/pernix_project",
"compiler/pernix_lexer",
"compiler/pernix_parser",
]
Earlier than going into implementing the precise parser, it’s a necessity to put in writing all types of Summary Syntax Tree information buildings first.
The abstract_syntax_tree
Module
We’ll create a brand new module referred to as abstract_syntax_tree
within the pernix_parser
crate. This module holds all information buildings referring to the Summary Syntax Tree.
When producing an AST, it could be helpful to incorporate the supply code place the parser parsed to make an AST. So, within the later section, when errors happen, the error message can encompass the place within the supply code that produces the errors.
Subsequently, in pernix_parser/abstract_syntax_tree.rs
, we’ll write a wrapper struct consisting of two fields: the summary syntax tree worth and its place vary.
/// A wrapper round a price that additionally accommodates the place of the worth within the
/// supply code
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct PositiionWrapper<T> {
pub place: Vary<SourcePosition>,
pub worth: T,
}
Subsequent, let’s write an enum that holds all of the values of binary operators.
/// Record of all out there binary operators
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum BinaryOperator {
Add,
Subtract,
Asterisk,
Slash,
%,
LessThan,
GreaterThan,
LessThanEqual,
GreaterThanEqual,
Equal,
NotEqual,
And,
Or,
}
Subsequent, write an enum that accommodates all of the values of unary operators.
/// Record of all out there unary operators
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum UnaryOperator {
Plus,
Minus,
LogicalNot,
}
Subsequent, embody one other enum with all of the methods to confer with a sort.
/// An enumeration containing all types of varieties referenced in this system
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Kind {
Identifier(String),
Array {
element_type: Field<Kind>,
measurement: usize,
},
}
For instance, a code with int32[5]
, an array of 5 int32
s, would produce an Kind
enum within the code just like this:
Kind::Array {
element_type: Field::new(Kind::Identifier("int32".to_string)),
measurement: 5
}
Subsequent, let’s create a submodule contained in the abstract_syntax_tree
module referred to as expression
. This submodule accommodates all of the code associated to Expresison Summary Syntax Tree.
Expression refers to a code in this system that may be evaluated and yields a price. For instance, within the
x = 4 + 3
assertion, the4 + 3
is an expression.
Now, let’s write an enum containing all of the types of an expression.
/// Enumeration containing all doable expressions
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Expression<'a> {
/// Represents an expression of the shape `left operator proper`
BinaryExpression {
left: Field<PositiionWrapper<Expression<'a>>>,
operator: PositiionWrapper<BinaryOperator>,
proper: Field<PositiionWrapper<Expression<'a>>>,
},
/// Represents an expression of the shape `operator operand`
UnaryExpression {
operator: PositiionWrapper<UnaryOperator>,
operand: Field<PositiionWrapper<Expression<'a>>>,
},
/// Represents an expression of the shape `literal`
LiteralExpression(LiteralConstantType<'a>),
/// Represents an expression of the shape `identifier`
IdentifierExpression {
identifier: PositiionWrapper<String>,
},
/// Represents an expression of the shape `function_name(arguments)`
FunctionCallExpression {
function_name: PositiionWrapper<String>,
arguments: Vec<PositiionWrapper<Expression<'a>>>,
},
}
- The
BinaryExpression
variant consists of two operand expressions separated by one binary operator, corresponding to4 + 5
. - The
UnaryExpression
variant consists of an operand expression and a unary operator, for instance,!true
. - The
LiteralExpression
is simple; the expression yielded from the hardcoded worth. - The
IdentifierExpression
refers to a reference to a variable. - The
FunctionCallExpression
refers to a name to a selected perform by identify and by offering arguments.
Subsequent, let’s write all of the variants of the assertion.
/// Enumeration containing all types of statements
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Assertion<'a> {
/// Represents an announcement of the shape `return expression;`
ReturnStatement {
expression: PositiionWrapper<Expression<'a>>,
},
/// Represents an announcement of the shape `expression;`
ExpressionStatement {
expression: PositiionWrapper<Expression<'a>>,
},
/// Represents an announcement of the shape `let identifier = expression;`
VariableDeclarationStatement {
identifier: PositiionWrapper<String>,
expression: PositiionWrapper<Expression<'a>>,
},
/// Represents an if assertion of the shape `if (situation) then_statement
/// else else_statement`
IfStatement {
situation: PositiionWrapper<Expression<'a>>,
then_statement: Vec<PositiionWrapper<Assertion<'a>>>,
else_statement: Possibility<Vec<PositiionWrapper<Assertion<'a>>>>,
},
}
Assertion is a fraction of code that carries out a selected motion in this system.
- The
ReturnStatement
variant represents an announcement that returns a price from a perform. - The
ExpressionStatement
variant represents an announcement that evaluates an expression. - The
VariableDeclarationStatement
variant represents an announcement that declares a brand new variable. - The
IfStatement
variant represents an announcement that executes a block of code if a situation is true. It may well additionally execute a distinct block of code if the situation is fake.
Subsequent, let’s create a brand new submodule contained in the abstract_syntax_tree
module referred to as declaration
. This module holds all of the AST code associated to all of the declarations that you’d outline in a program, corresponding to namespace declarations, features, and courses.
/// A declaration is an announcement that declares a brand new identify within the present scope.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Declaration<'a> {
/// Represents a namespace declaration of the shape `namespace namespace_name { declarations* }`
NamespaceDeclaration {
namespace_name: PositiionWrapper<String>,
declarations: Vec<PositiionWrapper<Declaration<'a>>>,
},
/// Represents a perform declaration of the shape `kind function_name(parameters) { statements* }`
FunctionDeclaration {
function_name: PositiionWrapper<String>,
parameters: Vec<PositiionWrapper<String>>,
return_type: PositiionWrapper<Kind>,
physique: Vec<PositiionWrapper<Assertion<'a>>>,
},
/// Represents a namespace utilizing assertion of the shape `utilizing namespace_name;`
UsingStatement{
namespace_name: PositiionWrapper<String>,
},
}
- The
NamespaceDeclaration
variant represents a declaration that declares a brand new namespace and accommodates different declarations inside it. - The
FunctionDeclaration
variant represents a declaration that declares a brand new perform and accommodates an inventory of statements inside it. - The
UsingDeclaration
variant represents a declaration that imports a namespace into the present scope.
The error
Module
Now, it is time to implement an error
module. This module will include all of the error varieties that we’ll use within the parser.
/// Enumeration containing all doable contexts that the error can happen in.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Context {
Program,
Assertion,
Expression,
Namespace,
}
/// Enumeration containing all doable errors that may happen throughout parsing.
#[derive(Debug, Clone)]
pub enum Error<'a> {
LexicalError(pernix_lexer::error::Error<'a>),
KeywordExpected {
expected_keyword: Key phrase,
found_token: Token<'a>,
source_reference: &'a SourceCode,
},
IdentifierExpected {
found_token: Token<'a>,
source_reference: &'a SourceCode,
},
PunctuatorExpected {
expected_punctuator: char,
found_token: Token<'a>,
source_reference: &'a SourceCode,
},
UnexpectedToken {
context: Context,
found_token: Token<'a>,
source_reference: &'a SourceCode,
},
}
-
The
Context
enumeration accommodates all of the doable contexts through which the error can happen. For instance, if the error happens within theProgram
context, it signifies that the error occurred within the high stage of this system. -
The
Error
enumeration accommodates all of the doable errors that may happen throughout parsing. TheLexicalError
variant is used to wrap the error that occurred in the course of the lexical evaluation. The opposite variants are used to symbolize the errors that occurred in the course of the parsing.
Writing The Parser
The parser is a state machine that behaves equally to the lexer. It turns the token stream right into a significant AST information construction. The parser that we’ll write is a mix of Recursive Descent Parsing and Operator-Precedence Parsing.
/// Represents a struct that parases the given token stream into summary syntax
/// bushes
pub struct Parser<'a> {
// The lexer that's used to generate the token stream
lexer: Lexer<'a>,
// the accrued tokens which have been tokenized by the lexer thus far
accumulated_tokens: Vec<Token<'a>>,
// the accrued errors which have been discovered throughout parsing thus far
accumulated_errors: Vec<Error<'a>>,
// The present place within the supply code
current_position: usize,
// Flag that signifies whether or not the parser ought to produce errors into the
// listing or not
produce_errors: bool,
}
Within the following sections, I am going to clarify the fields of the Parser
struct.
- The
lexer
subject is used to generate the token stream. It is a struct that
we have already applied within the earlier submit. - The
accumulated_tokens
subject is used to retailer the tokens which have been
tokenized by the lexer thus far. The parser will use this subject to get the token
already tokenized by the lexer. - The
accumulated_errors
subject shops the errors
discovered throughout parsing. - The
current_position
subject shops the present token stream index. - The
produce_errors
subject signifies whether or not the parser ought to
produce errors into the listing or not. This subject prevents the parser
from producing errors when recovering from an error.
Let’s create a new
constructor for the Parser
struct.
impl<'a> Parser<'a> {
/// Creates a brand new parser occasion from the given supply code
pub fn new(source_code: &'a SourceCode) -> Self {
let mut lexer = Lexer::new(source_code);
let mut accumulated_errors = Vec::new();
// get the primary token
let accumulated_tokens = vec![{
let token_return;
loop {
match lexer.lex() {
Ok(token) => {
token_return = token;
break;
}
Err(err) => {
accumulated_errors.push(Error::LexicalError(err));
}
}
}
token_return
}];
Self {
lexer,
accumulated_tokens,
accumulated_errors,
current_position: 0,
produce_errors: true,
}
}
}
The perform creates a brand new Lexer
occasion from the supply code. First, it will get the primary token from the lexer and shops it within the accumulated_tokens
subject. Then, it creates a brand new Parser
occasion and returns it.
Now, let’s implement the subsequent
perform. This perform will return the present token and transfer the current_position
subject to the following token. If the current_position
subject is the same as the size of the accumulated_tokens
subject, it signifies that the parser has reached the tip of the token stream. On this case, the perform will get the following token from the lexer and retailer it within the accumulated_tokens
subject. Furthermore, if the lexer encounters an error, the perform will retailer the error within the accumulated_errors
subject.
impl<'a> Parser<'a> {
// Will get the present token stream and strikes the present place ahead
pub(crate) fn subsequent(&mut self) -> &Token<'a> {
// must generate extra tokens
if self.current_position == self.accumulated_tokens.len() - 1 {
let new_token;
loop {
match self.lexer.lex() {
Okay(token) => {
new_token = token;
break;
}
Err(err) => {
self.accumulated_errors.push(Error::LexicalError(err));
}
}
}
// append the brand new token to the accrued tokens
self.accumulated_tokens.push(new_token);
}
// panic if the present place is larger than or equal the size
else if self.current_position >= self.accumulated_tokens.len() {
panic!("present place is larger than or equal the size of the accrued tokens");
}
// increment the present place
self.current_position += 1;
&self.accumulated_tokens[self.current_position - 1]
}
}
Now, let’s implement the peek
perform. This perform will merely index theaccumulated_tokens
subject with the current_position
subject. It is not going to
transfer the place ahead.
impl<'a> Parser<'a> {
// Will get the present token with out transferring the present place ahead
fn peek(&self) -> &Token<'a> {
&self.accumulated_tokens[self.current_position]
}
// Will get the token again by one place with out transferring the present place
fn peek_back(&self) -> &Token<'a> {
&self.accumulated_tokens[self.current_position - 1]
}
}
Earlier than going into writing the parsing perform, let’s focus on insignificant tokens. Within the earlier submit, we applied the lexer that may tokenize the feedback and whitespaces into tokens. Nonetheless, usually, the syntax does not care about these whitespaces and feedback. For instance, the next code is legitimate within the language.
let x = 5; // further areas? it is okay!
let x=5; // no areas? it is okay too!
let x = /*hey?*/ 5; // remark within the center? that is high quality!
Nonetheless, in some circumstances, it may be essential. For instance, the one distinction between the comparability operators ==
and two project operators =
is the area between them. Within the following code, the primary line will assign the worth of 5
to the variable x
; the second line will evaluate the worth of x
with 5
; the remaining is invalid.
x = 5; // sure
x == 5; // yeah
x = = 5; // ???
x =/*hey?*/= 5; // what?
Subsequently, we have to implement a perform that may skip insignificant tokens. The perform can be referred to as move_to_significant
and applied as follows.
impl<'a> Parser<'a> {
/// Strikes the present place to the following vital token
pub(crate) fn move_to_significant(&mut self) {
whereas !self.peek().is_significant_token() {
self.subsequent();
}
}
}
And on high of that, implement the perform that strikes to the following vital token, returns it, and strikes the place ahead.
impl<'a> Parser<'a> {
// Strikes the present place to the following vital token and emits it
pub(crate) fn next_significant(&mut self) -> &Token<'a> {
self.move_to_significant();
self.subsequent()
}
}
And if you’re questioning how the is_significant_token
is applied, it’s as follows.
impl<'a> Token<'a> {
/// Returns a boolean indicating whether or not this [`Token`] is a major.
/// Insignificant tokens are areas and feedback.
pub fn is_significant_token(&self) -> bool {
match self.token_kind() TokenKind::Remark => false,
_ => true,
}
}
Subsequent, let’s implement the perform that appends an error to the accumulated_errors
subject if the produce_errors
subject is true
.
impl<'a> Token<'a> {
// Appends the given error to the listing of accrued errors if the
// `produce_errors` flag is ready to true. Furthermore, it additionally rolls the parser
// place again to the `rollback_position`.
#[must_use]
pub fn append_error<T>(&mut self, error: Error<'a>) -> Possibility<T> {
if self.produce_errors {
self.accumulated_errors.push(error);
}
None
}
}
Now that we have got the essential features, let’s implement the parsing perform.
Parsing The Certified Title
Let’s begin with one thing easy. Let’s implement the perform that parses the certified identify. The certified identify is the scope’s identify, corresponding to Simmypeet.Compiler.Parser
. It follows the sample of an identifier adopted by a dot, adopted by one other identifier, and so forth.
pub fn parse_qualified_name(&mut self) -> Possibility<PositiionWrapper<String>> {
self.move_to_significant();
let starting_position = self.peek().position_range().begin;
// the string to return
let mut string = String::new();
// anticipate the primary identifier
if let TokenKind::Identifier = self.subsequent().token_kind() {
string.push_str(self.peek_back().lexeme());
} else {
return self.append_error(Error::IdentifierExpected {
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
// discovered further scopes
whereas matches!(self.peek().token_kind(), TokenKind::Punctuator('.')) {
// eat the dot
self.subsequent();
string.push('.');
// anticipate the following identifier
if let TokenKind::Identifier = self.subsequent().token_kind() {
string.push_str(self.peek_back().lexeme());
} else {
return self.append_error(Error::IdentifierExpected {
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
}
Some(PositiionWrapper {
place: starting_position..self.peek_back().position_range().finish,
worth: string,
})
}
The perform first strikes the place to the following vital token, as generally the parser’s place will not be on the vital token. Then, it information the beginning place of the certified identify. This sample can be utilized in the remainder of the parsing features.
It then expects the primary identifier and shops it within the string
variable. If the primary token will not be an identifier, it information an error and returns None
.
Lastly, it checks if a dot follows the identifier. Whether it is, it eats the dot, eats extra identifiers, and pushes them to the string
variable. It then checks once more if the following token is a dot. Whether it is, it repeats the method. If it isn’t, it returns the string
variable.
Parsing The Utilizing Assertion
Subsequent, let’s implement the perform that parses the utilizing
assertion. The assertion makes use of the certified identify we have applied within the earlier part. The parser will anticipate the utilizing
key phrase, the certified identify, and a semicolon.
/// Parses the present token stream as a using_statement
///
/// Using_Statement:
/// `utilizing` Qualified_Name `;`
pub fn parse_using_statement(
&mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
// transfer to the primary vital token
self.move_to_significant();
let starting_position = self.peek().position_range().begin;
// anticipate the `utilizing` key phrase
if !matches!(
self.subsequent().token_kind(),
TokenKind::Key phrase(Key phrase::Utilizing)
) {
return self.append_error(Error::KeywordExpected {
expected_keyword: Key phrase::Utilizing,
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
let qualified_name = self.parse_qualified_name()?;
// anticipate the semicolon
if !matches!(
self.next_significant().token_kind(),
TokenKind::Punctuator(';')
) {
return self.append_error(Error::PunctuatorExpected {
expected_punctuator: ';',
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
Some(PositiionWrapper {
place: starting_position..self.peek_back().position_range().finish,
worth: Declaration::UsingStatement {
namespace_name: qualified_name,
},
})
}
Parsing The Namespace Declaration
This one is a little more difficult. Earlier than we begin, let’s take a look at our error restoration technique. In a naive method, we are able to cease parsing if the parser encounters an error. Nonetheless, this isn’t a good suggestion, because it discards the remainder of the doable legitimate tokens. As an alternative, we are able to use the error restoration technique to proceed parsing the remainder of the tokens after the error.
We’ll use the Panic Mode Error Restoration technique in our circumstances. On this technique, after the parser encounters an error, it’ll search for the following legitimate token and proceed working from there, discarding the invalid tokens. This technique is probably the most used error restoration technique, as it’s easy to implement and efficient.
Let’s put us within the perspective of the parser. Think about the next code:
mod simmypeet {
use one thing;
use something_else;
1234 420 // Oops! it shouldn't be right here
mod identify {}
}
As you possibly can see, there’re quantity literals in the midst of the module declaration. When the parser encounters the quantity literals, it’ll file an error saying one thing like “surprising token, quantity literal will not be anticipated right here”. Then, it’ll search for the following legitimate token, which is the mod
key phrase. Our parser is not going to merely proceed parsing from the 1234
token then to come across one other 420
token and file one other error. As an alternative, it’ll skip the 1234
and 420
tokens, and proceed parsing from the following mod identify
declaration.
Now that we have got the error restoration technique, let’s implement the namespace declaration parsing perform. The parser will anticipate the namespace
key phrase, adopted by the certified identifier, adopted by the opening curly brace.
/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
/// `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
&mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
// transfer to the primary vital token
self.move_to_significant();
let starting_position = self.peek().position_range().begin;
// anticipate the `namespace` key phrase
if !matches!(
self.subsequent().token_kind(),
TokenKind::Key phrase(Key phrase::Namespace)
) {
return self.append_error(Error::KeywordExpected {
expected_keyword: Key phrase::Namespace,
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
let namespace_name = self.parse_qualified_name()?;
// anticipate the opening curly bracket
if !matches!(
self.next_significant().token_kind(),
TokenKind::Punctuator('{')
) {
return self.append_error(Error::PunctuatorExpected {
expected_punctuator: '{',
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
/* Extra to come back ... */
}
Subsequent, we’ll instantiate a brand new Vec
to retailer the declarations contained in the namespace. Then, we’ll begin a loop that may parse the declarations contained in the namespace. The loop is terminated when the parser encounters the closing curly brace or reaches the tip of the file.
/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
/// `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
&mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
/* From the earlier code ... */
// loop by way of all of the declarations
let mut declarations = Vec::new();
// transfer to the following vital token
self.move_to_significant();
// loop by way of all of the declarations till closing curly bracket or
// EOF is discovered
whereas !matches!(
self.peek().token_kind(),
TokenKind::Punctuator('}') | TokenKind::EndOfFile
) {
let declaration = match self.peek().token_kind() {
TokenKind::Key phrase(Key phrase::Namespace) => {
self.parse_namespace_declaration()
}
TokenKind::Key phrase(Key phrase::Utilizing) => {
self.parse_using_statement()
}
_ => self.append_error(Error::UnexpectedToken {
context: Context::Namespace,
found_token: self.peek().clone(),
source_reference: self.source_code(),
}),
};
if let Some(declaration) = declaration {
declarations.push(declaration);
} else token);
}
// transfer to the following vital token
self.move_to_significant();
}
/* Extra to come back ... */
}
You may need seen that we’re utilizing the skip_to
technique to skip to the following out there declaration or the closing curly brace. This technique is a helper technique that may bypass the tokens till the predicate returns true
. In our case, we’re skipping the tokens till we encounter the namespace
, utilizing
, or }
tokens.
Lastly, we’ll anticipate the closing curly brace and return the namespace declaration.
/// Parses the present token stream as a namespace declaration
///
/// Namespace_Declaration:
/// `namespace` Qualified_Name `{` Declaration* `}`
pub fn parse_namespace_declaration(
&mut self,
) -> Possibility<PositiionWrapper<Declaration<'a>>> {
/* From the earlier code ... */
// anticipate the closing curly bracket
if !matches!(
self.next_significant().token_kind(),
TokenKind::Punctuator('}')
) {
return self.append_error(Error::PunctuatorExpected {
expected_punctuator: '}',
found_token: self.peek_back().clone(),
source_reference: self.source_code(),
});
}
Some(PositiionWrapper {
place: starting_position..self.peek_back().position_range().finish,
worth: Declaration::NamespaceDeclaration {
namespace_name,
declarations,
},
})
}
Parsing The Program
Lastly, we’ll write the parse_program
technique that may parse the whole program. Fortunately, this system is only a listing of declarations, so we’ll simply loop by way of all of the declarations and parse them equally to how we parsed the namespace declaration.
/// Parses all token stream as a program
///
/// Program:
/// Declaration*
pub fn parse_program(&mut self) -> Possibility<Program<'a>> {
let mut program = Program {
declarations: Vec::new(),
};
// transfer to the following vital token
self.move_to_significant();
whereas !matches!(self.peek().token_kind(), TokenKind::EndOfFile) {
let declaration = match self.peek().token_kind() {
TokenKind::Key phrase(Key phrase::Namespace) => {
self.parse_namespace_declaration()
}
TokenKind::Key phrase(Key phrase::Utilizing) => {
self.parse_using_statement()
}
_ => self.append_error(Error::UnexpectedToken {
context: Context::Program,
found_token: self.peek().clone(),
source_reference: self.source_code(),
}),
};
if let Some(declaration) = declaration {
program.declarations.push(declaration);
} else TokenKind::Key phrase(Key phrase::Utilizing)
)
);
// transfer to the following vital token
self.move_to_significant();
}
Some(program)
}
Abstract
Phew!!😮💨😮💨 We have created a easy language parser that may perceive the essential program buildings. Nonetheless, we’re nonetheless going; within the subsequent submit, we’ll proceed writing the parser, which can parse the perform declarations.
As all the time, you could find the supply code for this submit on GitHub
Be at liberty to remark in case you have any ideas, questions, feedbacks, or recommendations. I can be blissful to learn them 😊.