Writing a Transpiler For a Subset of the Markdown Language

For a private undertaking, I wanted to create a Desk of Contents are given markdown content material, nevertheless, the markdown content material was to provide a distinct HTML output than the CommonMarkdown specification. The principles for getting from Markdown to HTML have been much like how Rust mdBook handles Desk of Content material.

Whereas it is attainable to do an ad-hoc conversion, with out paying an excessive amount of consideration to “the way it’s presupposed to be executed”, I am studying extra about compilers, lexers, parsers, transpilers and so I assumed it was a great way to dip my toes into the sphere.

This device is a transpiler (code to code), reasonably than a compiler (higher-level to lower-level code). It’s because the 2 languages we’re changing between, Markdown Frequent Specification and HTML, are markup languages.



Desk of Contents

whats up



Objective

We need to go from this:


# Title

[About](index.md)

- [Getting Start]()

- [CLI Usage](utilization/utilization.md)
  - [init](utilization/init.md)

Enter fullscreen mode

Exit fullscreen mode

to this:


<ol class="toc"https://style-tricks.com/alajmo/>
  <li class="part-title"https://style-tricks.com/alajmo/>Title</li>

  <li><a href="https://style-tricks.com/about"https://style-tricks.com/alajmo/>About</a></li>

  <li class="chapter-item"https://style-tricks.com/alajmo/>
    <sturdy>1.</sturdy>
    <a class="energetic" href="/index.html"https://style-tricks.com/alajmo/>About</a>
  </li>

  <li>
    <ol class="part"https://style-tricks.com/alajmo/>
      <li class="chapter-item draft"https://style-tricks.com/alajmo/>
        <sturdy>5.1.</sturdy>
        Desk of Contents Format
      </li>
    </ol>
  </li>
  <li class="part-title"https://style-tricks.com/alajmo/>Improvement</li>
  <li class="chapter-item"https://style-tricks.com/alajmo/>
    <sturdy>6.</sturdy>

    <a class="" href="/contributing.html"https://style-tricks.com/alajmo/>Contributing</a>
  </li>
</ol>

Enter fullscreen mode

Exit fullscreen mode

How will we go about doing it? Intuitively, we are able to map the character units between the 2 codecs. As an example,

# Title
Enter fullscreen mode

Exit fullscreen mode

maps to

<li class="part-title"https://style-tricks.com/alajmo/>Title</li>
Enter fullscreen mode

Exit fullscreen mode

and

[About](index.md)
Enter fullscreen mode

Exit fullscreen mode

maps to


<li class="chapter-item"https://style-tricks.com/alajmo/>
  <sturdy>1.</sturdy>
  <a class="energetic" href="/index.html"https://style-tricks.com/alajmo/>About</a>
</li>

Enter fullscreen mode

Exit fullscreen mode

and

- [init](utilization/init.md)
Enter fullscreen mode

Exit fullscreen mode

maps to


<li>
  <ol class="part"https://style-tricks.com/alajmo/>
    <li class="chapter-item draft"https://style-tricks.com/alajmo/>
      <sturdy>5.1.</sturdy>
      Desk of Contents Format
    </li>
  </ol>
</li>

Enter fullscreen mode

Exit fullscreen mode

To go additional into mapping between the 2 units, we are able to have a look at particular characters:

  • A hash/quantity # within the markdown format denotes that we’re beginning a listing tag <li></li>, maybe we are able to name this title
  • A left-side bracket [ followed by letters, spaces, and/or other characters, then, wrapped by a ], which then adopted by a pair of parentheses (), denotes a hyperlink tag <a></a>
  • a touch signal - within the Markdown format maps to the start of a listing merchandise tag <li></li>within the HTML format. When mixed with a hyperlink tag, it additionally provides a numbering, so we are able to name this checklist merchandise

Lastly, we wrap all the character units with a listing merchandise <li></li> tag.
As you have most likely observed, we’ve not launched any formal terminology or talked about how we’ll be mapping between the 2 codecs, and so that is what we’ll have a look at subsequent.



Terminology and Mapping

Earlier than we begin delving into the code, let’s introduce some helpful terminology for the parts that we’ll want.

First, we’ll want a way for studying the characters from the textual content stream.
This element known as the Reader: given a textual content, we iterate via every of the characters.

Subsequent, we’ll begin combining characters into significant tokens, that is known as Lexing and is the job of the Lexer.

Afterward, we’ll mix the totally different Tokens to provide a brand new Syntax Tree. That is known as parsing and is completed by the Parser.

Within the remaining step, we’ll undergo the generated Syntax Tree and write a Renderer perform which is able to generate the HTML.

Moreover, you possibly can divide the totally different parts into two or extra stacks: backend and frontend. The frontend is normally liable for lexing and parsing the code, whereas the backend transforms the code into one thing else.

In our case the frontend stack consists of the next parts:

And the backend stack consists of just one element:



Reader

The reader is the best of the parts, the one function is to retrieve characters from totally different codecs, be it a string, a file, and many others. The reader has an inner state of which character its presently studying, and a few helper public strategies which can be used to retrieve a single character from the stream.


'whats up + world' -> ['h','e','l','l','o','+','w','o','r','l','d']

Enter fullscreen mode

Exit fullscreen mode



Lexer

The Lexer is the following easiest a part of the frontend stack, its principal function is to create significant tokens from a stream of characters (which can also be known as Lexical Evaluation) which it will get from the Reader. It discards unimportant tokens like areas (if it is not a part of the lexical grammar, like in python). Equally to the reader, it exposes some public strategies that allow you to extract the newly minted tokens.


['h','e','l','l','o','+','w','o','r','l','d'] -> ['hello', '+', 'world']`

Enter fullscreen mode

Exit fullscreen mode



Parser

In our program, the Parser is probably the most superior element and is liable for producing a Syntax or Summary Syntax Tree, which is a tree construction that represents significant grammar. The distinction between a Syntax Tree and an Summary Syntax Tree is that the Syntax Tree consists of all info of the supply code, together with whitespaces which can be superfluous. The AST however is an summary illustration of the supply code that doesn’t embody superfluous information.


['hello', '+', 'world']`

->

[
  { token: 'string', lexeme: 'hello' },
  { token: 'string', lexeme: '+' },
  { token: 'string', lexeme: 'world' },
]

Enter fullscreen mode

Exit fullscreen mode



Renderer

The renderer does the ultimate transformation, from an AST/Syntax Tree to the specified output language, in our case, HTML.


[
  { token: 'string', lexeme: 'hello' },
  { token: 'string', lexeme: '+' },
  { token: 'string', lexeme: 'world' },
]

->

<div>whats up</div>
<div>+</div>
<div>world</div>

Enter fullscreen mode

Exit fullscreen mode



The Street to HyperText Markup Language

Subsequent up we’ll implement the Reader, Lexer, Parser and Renderer.



1: Construct the Reader

Let’s begin with the best element, the Reader. In our case, it is merely a perform that takes a string as enter and gives some strategies to work together with mentioned string. Like nearly all the parts, it is a stateful perform, it retains observe of what character we’re presently studying.

The principle strategies it exposes are:

  • peek: lookup the nth character that’s forward of the place we presently are
  • devour: return the present character and transfer to the following character
  • isEOF: Examine if we now have reached the tip of the string

perform TocReader(chars: string) {
  let i = 0;

  perform peek(nth: quantity = 0) {
    return chars[i + nth];
  }

  perform devour(nth: quantity = 0) {
    const c = chars[i + nth];
    i = i + nth + 1;

    return c;
  }

  perform isEOF() {
    return chars.size - 1 < i;
  }

  return Object.freeze({
    peek,
    devour,
    isEOF,
  });
}

Enter fullscreen mode

Exit fullscreen mode



2: Outline the grammar

Subsequent up we’ll need to establish the tokens we’ll enable in our DSL (Area-Particular-Language).

We need to take this textual content:


# Title

[About](index.md)

---

- [Getting Start]()

- [CLI Usage](utilization/utilization.md)
  - [init](utilization/init.md)

Enter fullscreen mode

Exit fullscreen mode

and discover the tokens that symbolize it. One interpretation is:


| Textual content                | Tokens                                                                            |
|---------------------|-----------------------------------------------------------------------------------|
| # Title             | [HASH, SPACE, STRING]                                                             |
| [About](index.md)   | [LEFT_BRACE, STRING, RIGHT_BRACE, LEFT_PAREN, STRING, RIGHT_PAREN]                |
| ---                 | [HORIZONTAL_RULE]                                                                 |
| - [Getting Start]() | [DASH, SPACE, LEFT_BRACE, STRING, RIGHT_BRACE, LEFT_PAREN, RIGHT_PAREN]           |
| - [Usage](utilization.md) | [DASH, SPACE, LEFT_BRACE, STRING, RIGHT_BRACE, LEFT_PAREN, STRING, RIGHT_PAREN]   |
|  - [Init](init.md)  | [INDENT, SPACE, LEFT_BRACE, STRING, RIGHT_BRACE, LEFT_PAREN, STRING|RIGHT_PAREN] |
|                     | [EOF]                                                                             |

Enter fullscreen mode

Exit fullscreen mode

Then the tokens are:


enum TokenType {
  LEFT_PAREN,
  RIGHT_PAREN,
  LEFT_BRACE,
  RIGHT_BRACE,
  SPACE,
  INDENT,
  DASH,

  STRING,
  HASH,
  HORIZONTAL_RULE,

  EOF,
}

interface Token  null;


Enter fullscreen mode

Exit fullscreen mode

The interface Token which represents our token, consists of two properties: kind and lexeme. The sort is ready to one of many TokenTypes and the lexeme is the worth of the token, nil if it’s a token that has no worth and a string in any other case.

The principles that decide how a specific language teams characters into lexemes known as Lexical Grammar. Subsequent up we’ll construct our Lexer.



3: Begin Lexing

Just like the Reader element, the lexer token could have three public strategies, which the Parser will use to assemble the AST:

A lexer is a stateful object, it comprises a pointer/index to the present aspect being learn and an array of tokens. The core of the lexer is the perform liable for truly lexing: creating tokens from a string of characters.

The non-public methodology scanToken is liable for lexing. It steps via the enter stream (the output of the reader), which we set off by the run methodology, which is some time loop that runs the scanToken perform till we hit the EOF character.

Now, there are a number of methods to implement the scanToken methodology, we select a easy swap, albeit you should utilize one thing like a recursive state perform that calls itself till it hits the EOF character.

On this case, I select a swap assertion as an alternative, the place the outer loop within the run perform is liable for operating the scanToken methodology till we hit EOF.

The swap assertion seems on the present character being learn from the Reader and tries to match it in opposition to a token within the lexer grammar. For easy tokens, corresponding to LEFT_BRACE, it may match 1 to 1 and add the token immediately, however for others, such because the HORIZONTAL_RULE, it makes use of reader.peek to look forward. For extra difficult symbols, corresponding to strings, we outline a particular perform that has to look a number of characters forward.


perform TocLexer(reader: any) {
  let i = 0;
  const tokens: Token[] = [];

  run();

  perform run() {
    whereas (!reader.isEOF()) {
      scanToken();
    }

    addToken(TokenType.EOF);
  }

  perform scanToken() {
    const c = reader.devour();

    swap (c) {
      case "https://style-tricks.com/alajmo/n"https://style-tricks.com/alajmo/:
        break;
      case "https://style-tricks.com/alajmo/ "https://style-tricks.com/alajmo/:
        if (reader.peek() === "https://style-tricks.com/alajmo/ "https://style-tricks.com/alajmo/) {
          addToken(TokenType.INDENT);
          reader.devour();
        }
        break;
      case "https://style-tricks.com/alajmo/["https://style-tricks.com/alajmo/:
        addToken(TokenType.LEFT_BRACE);
        break;
      case "https://style-tricks.com/alajmo/]"https://style-tricks.com/alajmo/:
        addToken(TokenType.RIGHT_BRACE);
        break;
      case "https://style-tricks.com/alajmo/("https://style-tricks.com/alajmo/:
        addToken(TokenType.LEFT_PAREN);
        break;
      case "https://style-tricks.com/alajmo/)"https://style-tricks.com/alajmo/:
        addToken(TokenType.RIGHT_PAREN);
        break;
      case "https://style-tricks.com/alajmo/#"https://style-tricks.com/alajmo/:
        header();
        break;
      case "https://style-tricks.com/alajmo/-"https://style-tricks.com/alajmo/:
        if (reader.peek() === "https://style-tricks.com/alajmo/-' && reader.peek(1) === "https://style-tricks.com/alajmo/-"https://style-tricks.com/alajmo/) {
          reader.devour();
          reader.devour();
          addToken(TokenType.HORIZONTAL_RULE);
        } else {
          addToken(TokenType.DASH);
        }
        break;
      default:
        stringLiteral(c);
        break;
    }
  }

  perform header() {
    let textual content = '"https://style-tricks.com/alajmo/;
    whereas (reader.peek() !== "https://style-tricks.com/alajmo/n' && !reader.isEOF()) {
      textual content += reader.devour();
    }

    textual content = textual content.trim();

    addToken(TokenType.HASH, textual content);
  }

  perform stringLiteral(c: string) {
    let textual content = c;
    let unClosedParens = 0;
    let unClosedBraces = 0;

    whereas (reader.peek() !== "https://style-tricks.com/alajmo/n' && !reader.isEOF()) {
      const nth = reader.peek();

      if (nth === "https://style-tricks.com/alajmo/("https://style-tricks.com/alajmo/) {
        unClosedParens += 1;
      } else if (nth === "https://style-tricks.com/alajmo/)' && unClosedParens === 0) {
        break;
      } else if (nth === "https://style-tricks.com/alajmo/)"https://style-tricks.com/alajmo/) {
        unClosedParens -= 1;
      } else if (nth === "https://style-tricks.com/alajmo/["https://style-tricks.com/alajmo/) {
        unClosedBraces += 1;
      } else if (nth === "https://style-tricks.com/alajmo/]' && unClosedBraces === 0) {
        break;
      } else if (nth === "https://style-tricks.com/alajmo/]"https://style-tricks.com/alajmo/) {
        unClosedBraces -= 1;
      }

      // All BRACE and PARENS have to be closed
      textual content += reader.devour();
    }

    addToken(TokenType.STRING, textual content);
  }

  perform addToken(kind: TokenType, lexeme: string | null = null) {
    tokens.push({ kind, lexeme });
  }

  perform peek(nth: quantity = 0) {
    return tokens[i + nth].kind;
  }

  perform devour(nth: quantity = 0) {
    const t = tokens[i + nth];
    i = i + nth + 1;

    return t;
  }

  perform isEOF() {
    return tokens.size - 1 < i;
  }

  perform printTokens() {
    tokens.map(okay => {
      log.information(`kind: ${TokenType[k.type]} lexeme: ${okay.lexeme}`);
    });
  }

  perform printTokenType() {
    tokens.map(okay => {
      log.information(`${TokenType[k.type]}`);
    });
  }

  return Object.freeze({ peek, devour, isEOF, printTokens, printTokenType });
}

Enter fullscreen mode

Exit fullscreen mode

Lastly, after we now have lexed the enter, we are going to find yourself with an array containing the tokens:


const token[] Token = [
  {
    type: "https://style-tricks.com/alajmo/LEFT_BRACE"https://style-tricks.com/alajmo/,
    lexeme: nil,
  },

  {
    type: "https://style-tricks.com/alajmo/STRING"https://style-tricks.com/alajmo/,
    lexeme: "https://style-tricks.com/alajmo/About"https://style-tricks.com/alajmo/,
  },

  {
    type: "https://style-tricks.com/alajmo/RIGHT_BRACE"https://style-tricks.com/alajmo/,
    lexeme: nil,
  },

  {
    type: "https://style-tricks.com/alajmo/LEFT_PAREN"https://style-tricks.com/alajmo/,
    lexeme: nil,
  },

  {
    type: "https://style-tricks.com/alajmo/STRING"https://style-tricks.com/alajmo/,
    lexeme: "https://style-tricks.com/alajmo/index.md"https://style-tricks.com/alajmo/,
  },

  {
    type: "https://style-tricks.com/alajmo/RIGHT_PARENT"https://style-tricks.com/alajmo/,
    lexeme: nil,
  },
];

Enter fullscreen mode

Exit fullscreen mode



4: Generate an Summary Syntax Tree

Now which have our tokens, we are able to begin assembling them to create an AST. To take action, we now have to create a proper language of how the AST is constructed, which is able to we do by creating a proper(ish) grammar.

Formal grammar is a pseudo-code(ish) description of our DSL. There are tutorial phrases for this, one is parsing expression grammar (PEM), one other one is High-down parsing language (TDPL), and yet one more one is Context-free grammar (CFG). They’re all very formal, and since this can be a much less formal undertaking, we can’t be strict with our language.

Subsequent up we’ll create a set of production rules, that are a bunch of rewrite rule which can be used to substitute symbols with different values that may be recursively carried out to generate new image sequences.

The manufacturing guidelines are of the shape:

A -> a
Enter fullscreen mode

Exit fullscreen mode

The place A is a nonterminal image, and a a string of terminals and/or nonterminals.


  expression -> hr | header | hyperlink | checklist | expression

  hr     ->  ---
  header ->  # STRING
  hyperlink   ->  "[" STRING "]" "(" STRING? ")" | checklist
  checklist   ->  "INDENT"+ "-" hyperlink

Enter fullscreen mode

Exit fullscreen mode

terminal and nonterminal are outlined as follows:

  • terminal: after we encounter terminal symbols we do not do any substitutions. So for example, ---, STRING are terminal symbols
  • nonterminal: nonterminal symbols can include different symbols, and so we now have to look at them till we get to a terminal image. In our case, expression, hr, header, hyperlink checklist are nonterminal symbols

Now let’s implement this formal grammar. We start with the parse perform that may run via the tokens we acquired from the lexer till it reaches the EOF token. For every token, it can parse the expression (with a swap assertion) utilizing the peek methodology from the Lexer.


perform TocParser(lexer: any) {
  perform hr() {
    return { kind: "https://style-tricks.com/alajmo/hr"https://style-tricks.com/alajmo/, indent: 0 };
  }

  perform header(textual content: string) {
    return { kind: "https://style-tricks.com/alajmo/header"https://style-tricks.com/alajmo/, textual content, indent: 0 };
  }

  perform hyperlink(title: string, ref: string = '"https://style-tricks.com/alajmo/) {
    return { kind: "https://style-tricks.com/alajmo/hyperlink"https://style-tricks.com/alajmo/, title, ref, indent: 0 };
  }

  perform checklist(youngsters: any[] = []) {
    return { kind: "https://style-tricks.com/alajmo/checklist"https://style-tricks.com/alajmo/, youngsters };
  }

  perform listItem(title: string, ref: string = '"https://style-tricks.com/alajmo/, indent: quantity): any {
    return {
      kind: "https://style-tricks.com/alajmo/listItem"https://style-tricks.com/alajmo/,
      title,
      ref,
      indent,
    };
  }

  perform createAST(statements: any): any {
    const listRefs: any = { 0: checklist() };
    statements.forEach((v: any, i: quantity, arr: any) => {
      if (i > 0 && arr[i - 1].indent > v.indent) {
        delete listRefs[arr[i - 1].indent];
      }

      if (listRefs.hasOwnProperty(v.indent)) {
        listRefs[v.indent].youngsters.push(v);
      } else {
        listRefs[v.indent] = {
          kind: "https://style-tricks.com/alajmo/checklist"https://style-tricks.com/alajmo/,
          youngsters: [v],
        };

        listRefs[v.indent - 1].youngsters.push(listRefs[v.indent]);
      }
    });

    return listRefs[0];
  }

  perform parse() {
    const statements = [];
    whereas (!lexer.isEOF()) {
      const expr = expression();

      if (expr) {
        statements.push(expr);
      }
    }

    return createAST(statements);
  }

  perform expression(numIndent = 0): any {
    const token = lexer.devour();

    swap (token.kind) {
      case TokenType.HORIZONTAL_RULE:
        return hr();
      case TokenType.HASH:
        return header(token.lexeme);
      case TokenType.LEFT_BRACE:
        return parseLink();
      case TokenType.INDENT:
        return expression(numIndent + 1);
      case TokenType.DASH:
        return parseListItem(numIndent);
      default:
    }
  }

  perform parseListItem(numIndent = 0): any {
    if (
      lexer.peek(0) === TokenType.LEFT_BRACE &&
      lexer.peek(1) === TokenType.STRING &&
      lexer.peek(2) === TokenType.RIGHT_BRACE
    ) {
      const title = lexer.devour(1);
      let ref = '"https://style-tricks.com/alajmo/;
      if (lexer.peek(2) === TokenType.STRING) {
        ref = lexer.devour(2);
      }

      lexer.devour(); // Eat proper parens

      return listItem(title.lexeme, ref.lexeme, numIndent);
    }
  }

  perform parseLink(): any {
    if (
      lexer.peek() === TokenType.STRING &&
      lexer.peek(1) === TokenType.RIGHT_BRACE
    ) {
      const title = lexer.devour();
      const ref = lexer.devour(2);
      lexer.devour();

      return hyperlink(title.lexeme, ref.lexeme);
    }
  }

  return parse();
}

Enter fullscreen mode

Exit fullscreen mode

As an example, if we have been to parse the next markdown snippet:


# markBook

---

[About](index.md)

- [Getting Start](getting-started.md)
  - [Getting Start](getting-started.md)
    - [Getting Start](getting-started.md)

Enter fullscreen mode

Exit fullscreen mode

We might find yourself with this AST:


{
  "kind"https://style-tricks.com/alajmo/: "checklist"https://style-tricks.com/alajmo/,
  "youngsters"https://style-tricks.com/alajmo/: [
      {
          "type"https://style-tricks.com/alajmo/: "header"https://style-tricks.com/alajmo/,
          "text"https://style-tricks.com/alajmo/: "markBook"https://style-tricks.com/alajmo/,
          "indent"https://style-tricks.com/alajmo/: 0
      },
      {
          "type"https://style-tricks.com/alajmo/: "hr"https://style-tricks.com/alajmo/,
          "indent"https://style-tricks.com/alajmo/: 0
      },
      {
          "type"https://style-tricks.com/alajmo/: "link"https://style-tricks.com/alajmo/,
          "title"https://style-tricks.com/alajmo/: "About"https://style-tricks.com/alajmo/,
          "ref"https://style-tricks.com/alajmo/: "index.md"https://style-tricks.com/alajmo/,
          "indent"https://style-tricks.com/alajmo/: 0
      },
      {
          "type"https://style-tricks.com/alajmo/: "listItem"https://style-tricks.com/alajmo/,
          "title"https://style-tricks.com/alajmo/: "Getting Start"https://style-tricks.com/alajmo/,
          "ref"https://style-tricks.com/alajmo/: "getting-started.md"https://style-tricks.com/alajmo/,
          "indent"https://style-tricks.com/alajmo/: 0
      },
      {
          "type"https://style-tricks.com/alajmo/: "list"https://style-tricks.com/alajmo/,
          "children"https://style-tricks.com/alajmo/: [
              {
                  "type"https://style-tricks.com/alajmo/: "listItem"https://style-tricks.com/alajmo/,
                  "title"https://style-tricks.com/alajmo/: "Getting Start"https://style-tricks.com/alajmo/,
                  "ref"https://style-tricks.com/alajmo/: "getting-started.md"https://style-tricks.com/alajmo/,
                  "indent"https://style-tricks.com/alajmo/: 1
              },
              {
                  "type"https://style-tricks.com/alajmo/: "list"https://style-tricks.com/alajmo/,
                  "children"https://style-tricks.com/alajmo/: [
                      {
                          "type"https://style-tricks.com/alajmo/: "listItem"https://style-tricks.com/alajmo/,
                          "title"https://style-tricks.com/alajmo/: "Getting Start"https://style-tricks.com/alajmo/,
                          "ref"https://style-tricks.com/alajmo/: "getting-started.md"https://style-tricks.com/alajmo/,
                          "indent"https://style-tricks.com/alajmo/: 2
                      }
                  ]
              }
          ]
      }
  ]
}

Enter fullscreen mode

Exit fullscreen mode



5: Generate HTML

Now that we now have our AST, we are able to implement the Renderer which is able to generate the specified HTML. The perform liable for this transformation is TocRender, a perform that takes as enter the AST object and loops via it to generate our HTML.

We begin by writing our principal loop which is able to deal with all the AST varieties and wrap it in an ol tag:


<ol class="https://style-tricks.com/alajmo/toc"https://style-tricks.com/alajmo/>${ast.youngsters.map((e: any) => {
      swap (e.kind) {
        case "https://style-tricks.com/alajmo/hr"https://style-tricks.com/alajmo/:
          return hr();
        case "https://style-tricks.com/alajmo/header"https://style-tricks.com/alajmo/:
          return header(e);
        case "https://style-tricks.com/alajmo/hyperlink"https://style-tricks.com/alajmo/:
          return hyperlink(e);
        case "https://style-tricks.com/alajmo/listItem"https://style-tricks.com/alajmo/:
          order += 1;
          return listItem(e, [order]);
        case "https://style-tricks.com/alajmo/checklist"https://style-tricks.com/alajmo/:
          return checklist(e, [order]);
        default:
      }
    })
    .be part of(""https://style-tricks.com/alajmo/)
  }
</ol>`;

Enter fullscreen mode

Exit fullscreen mode

Subsequent, we write features to deal with the totally different HTML tags:


perform formatUrl(currentFileName) {
  let hyperlink = stripExtension(currentFileName);
  hyperlink = hyperlink.exchange(website.paths.content material, ""https://style-tricks.com/alajmo/);

  return hyperlink;
}

perform stripExtension(url) {
  let hyperlink = path.relative(website.paths.content material, url);
  hyperlink = path.be part of("https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/, path.dirname(url), path.basename(url, "https://style-tricks.com/alajmo/.md"https://style-tricks.com/alajmo/));

  if (website.uglyURLs) {
    hyperlink += "https://style-tricks.com/alajmo/.html"https://style-tricks.com/alajmo/;
  }

  return hyperlink;
}

perform hr() {
  return "https://style-tricks.com/alajmo/<li class="spacer"></li>"https://style-tricks.com/alajmo/;
}

perform header(e: any) {
  return `<li class="part-title">${e.textual content}</li>`;
}

perform hyperlink(e: any): any {
  let ref = e.ref !== "" ? stripExtension(e.ref) : ""https://style-tricks.com/alajmo/;
  const linkClass = ref === activePage ? "https://style-tricks.com/alajmo/energetic" : ""https://style-tricks.com/alajmo/;

  // We deal with index.md in root file in another way
  if (ref === "https://style-tricks.com/alajmo//index"https://style-tricks.com/alajmo/) {
    ref = "https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/;
  }

  if (website.rootUrl) {
    ref = `${website.rootUrl}${ref}`
  }

  return ref
    ? `<li><a category="https://style-tricks.com/alajmo/${linkClass}"  href="https://style-tricks.com/alajmo/${ref}">${e.title}</a></li>`
    : `<li class="draft">${e.title}</li>`;
}

perform listItem(e: any, order: quantity[]) {
  let ref = e.ref !== "" ? stripExtension(e.ref) : ""https://style-tricks.com/alajmo/;

  const linkClass = ref === activePage ? "https://style-tricks.com/alajmo/energetic" : ""https://style-tricks.com/alajmo/;

  // We deal with index.md in root file in another way
  if (ref === "https://style-tricks.com/alajmo//index"https://style-tricks.com/alajmo/) {
    ref = "https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/;
  }

  if (website.rootUrl) {
    ref = `${website.rootUrl}${ref}`
  }

  return ref
    ? `
    <li class="chapter-item">
      <sturdy>${[...order, ""https://style-tricks.com/alajmo/].be part of("https://style-tricks.com/alajmo/."https://style-tricks.com/alajmo/)}</sturdy>
      &nbsp;
      <a category="https://style-tricks.com/alajmo/${linkClass}"
         href="https://style-tricks.com/alajmo/${ref}">${e.title}</a>
    </li>
  `
    : `
    <li class="chapter-item draft">
      <sturdy>${[...order, ""https://style-tricks.com/alajmo/].be part of("https://style-tricks.com/alajmo/."https://style-tricks.com/alajmo/)}</sturdy>
      &nbsp;
      ${e.title}
    </li>
  `;
}

perform checklist(e: any, order: quantity[]): any {
  return `
      <li>
        <ol class="part">
          ${
    e.youngsters
      .map((node: any, i: quantity) =>
        node.kind === "https://style-tricks.com/alajmo/checklist"
          ? checklist(node, [...order, i + 1])
          : listItem(node, [...order, i + 1])
      )
      .be part of(""https://style-tricks.com/alajmo/)
  }
        </ol>
      </li>
    `;
}

Enter fullscreen mode

Exit fullscreen mode

There are some ad-hoc manipulations since we have to deal with some particular performance, corresponding to including an energetic HTML class when a web page is energetic, or after we’re coping with an index.html web page.

Lastly, the TocRender perform in its entirety.


perform TocRender(website: Website, ast: any, currentFileName: string) {
  const activePage = formatUrl(currentFileName);

  perform formatUrl(currentFileName) {
    let hyperlink = stripExtension(currentFileName);
    hyperlink = hyperlink.exchange(website.paths.content material, ""https://style-tricks.com/alajmo/);

    return hyperlink;
  }

  perform stripExtension(url) {
    let hyperlink = path.relative(website.paths.content material, url);
    hyperlink = path.be part of("https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/, path.dirname(url), path.basename(url, "https://style-tricks.com/alajmo/.md"https://style-tricks.com/alajmo/));

    if (website.uglyURLs) {
      hyperlink += "https://style-tricks.com/alajmo/.html"https://style-tricks.com/alajmo/;
    }

    return hyperlink;
  }

  perform hr() {
    return "https://style-tricks.com/alajmo/<li class="spacer"></li>"https://style-tricks.com/alajmo/;
  }

  perform header(e: any) {
    return `<li class="part-title">${e.textual content}</li>`;
  }

  perform hyperlink(e: any): any {
    let ref = e.ref !== "" ? stripExtension(e.ref) : ""https://style-tricks.com/alajmo/;
    const linkClass = ref === activePage ? "https://style-tricks.com/alajmo/energetic" : ""https://style-tricks.com/alajmo/;

    // We deal with index.md in root file in another way
    if (ref === "https://style-tricks.com/alajmo//index"https://style-tricks.com/alajmo/) {
      ref = "https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/;
    }

    if (website.rootUrl) {
      ref = `${website.rootUrl}${ref}`
    }

    return ref
      ? `<li><a category="https://style-tricks.com/alajmo/${linkClass}"  href="https://style-tricks.com/alajmo/${ref}">${e.title}</a></li>`
      : `<li class="draft">${e.title}</li>`;
  }

  perform listItem(e: any, order: quantity[]) {
    let ref = e.ref !== "" ? stripExtension(e.ref) : ""https://style-tricks.com/alajmo/;

    const linkClass = ref === activePage ? "https://style-tricks.com/alajmo/energetic" : ""https://style-tricks.com/alajmo/;

    // We deal with index.md in root file in another way
    if (ref === "https://style-tricks.com/alajmo//index"https://style-tricks.com/alajmo/) {
      ref = "https://style-tricks.com/alajmo//"https://style-tricks.com/alajmo/;
    }

    if (website.rootUrl) {
      ref = `${website.rootUrl}${ref}`
    }

    return ref
      ? `
      <li class="chapter-item">
        <sturdy>${[...order, ""https://style-tricks.com/alajmo/].be part of("https://style-tricks.com/alajmo/."https://style-tricks.com/alajmo/)}</sturdy>
        &nbsp;
        <a category="https://style-tricks.com/alajmo/${linkClass}"
           href="https://style-tricks.com/alajmo/${ref}">${e.title}</a>
      </li>
    `
      : `
      <li class="chapter-item draft">
        <sturdy>${[...order, ""https://style-tricks.com/alajmo/].be part of("https://style-tricks.com/alajmo/."https://style-tricks.com/alajmo/)}</sturdy>
        &nbsp;
        ${e.title}
      </li>
    `;
  }

  perform checklist(e: any, order: quantity[]): any {
    return `
        <li>
          <ol class="part">
            ${
      e.youngsters
        .map((node: any, i: quantity) =>
          node.kind === "https://style-tricks.com/alajmo/checklist"
            ? checklist(node, [...order, i + 1])
            : listItem(node, [...order, i + 1])
        )
        .be part of(""https://style-tricks.com/alajmo/)
    }
          </ol>
        </li>
      `;
  }

  let order = 0;
  return `<ol class="toc">${
    ast.youngsters
      .map((e: any) => {
        swap (e.kind) {
          case "https://style-tricks.com/alajmo/hr"https://style-tricks.com/alajmo/:
            return hr();
          case "https://style-tricks.com/alajmo/header"https://style-tricks.com/alajmo/:
            return header(e);
          case "https://style-tricks.com/alajmo/hyperlink"https://style-tricks.com/alajmo/:
            return hyperlink(e);
          case "https://style-tricks.com/alajmo/listItem"https://style-tricks.com/alajmo/:
            order += 1;
            return listItem(e, [order]);
          case "https://style-tricks.com/alajmo/checklist"https://style-tricks.com/alajmo/:
            return checklist(e, [order]);
          default:
        }
      })
      .be part of(""https://style-tricks.com/alajmo/)
  }</ol>`;
}

Enter fullscreen mode

Exit fullscreen mode



6: Outcomes

Lastly, we now have our HTML. Right here is an outline of the steps we took and what the information construction seems like for every stage.

Fig.1 – Flowchart



Remaining Ideas

There are a number of lacking and essential elements to writing transpilers that I’ve omitted:

  • Correct error dealing with (how the customers writing the DSL language will get info on syntax and grammar errors),
  • Writing exams
  • Enhance efficiency
  • Being a bit extra idiomatic in how the parser generates the AST
  • Swap to recursive state perform for the Lexer and Parser

However for a pet undertaking, this can suffice. There’s additionally a bunch of classes I’ve discovered alongside the way in which when writing transpilers:

  • Begin with Lexical Grammar and the AST definition, and never with programming!
  • Transpilers is a wonderful candidate for unit testing
  • When debugging, be certain that your lexer behaves! Do not all the time assume the lexer behaves accurately when debugging the parser



Sources

Some helpful assets if you wish to study extra about compilers, transpilers, and parsers generally.

Add a Comment

Your email address will not be published. Required fields are marked *