Let’s Create a Tiny Programming Language | Style-Tricks

By now, you might be in all probability accustomed to a number of programming languages. However have you ever ever puzzled how you may create your individual programming language? And by that, I imply:

A programming language is any algorithm that convert strings to varied sorts of machine code output.

Briefly, a programming language is only a set of predefined guidelines. And to make them helpful, you want one thing that understands these guidelines. And people issues are compilers, interpreters, and many others. So we will merely outline some guidelines, then, to make it work, we will use any present programming language to make a program that may perceive these guidelines, which can be our interpreter.

Compiler

A compiler converts codes into machine code that the processor can execute (e.g. C++ compiler).

Interpreter

An interpreter goes by way of this system line by line and executes every command.

Wish to give it a strive? Let’s create an excellent easy programming language collectively that outputs magenta-colored output within the console. We’ll name it Magenta.

Screenshot of terminal output in color magenta.
Our easy programming language creates a codes variable that incorporates textual content that will get printed to the console… in magenta, in fact.

Establishing our programming language

I’m going to make use of Node.js however you need to use any language to observe alongside, the idea will stay the identical. Let me begin by creating an index.js file and set issues up.

class Magenta {
  constructor(codes) {
    this.codes = codes
  }
  run() {
    console.log(this.codes)
  }
}

// For now, we're storing codes in a string variable referred to as `codes`
// Later, we are going to learn codes from a file
const codes = 
`print "whats up world"
print "whats up once more"`
const magenta = new Magenta(codes)
magenta.run()

What we’re doing right here is declaring a category referred to as Magenta. That class defines and initiates an object that’s chargeable for logging textual content to the console with no matter textual content we offer it by way of a codes variable. And, in the interim, we’ve outlined that codes variable instantly within the file with a few “whats up” messages.

Screenshot of terminal output.
If we have been to run this code we might get the textual content saved in codes logged within the console.

OK, now we have to create a what’s referred to as a Lexer.

What’s a Lexer?

OK, let’s talks concerning the English language for a second. Take the next phrase:

How are you?

Right here, “How” is an adverb, “are” is a verb, and “you” is a pronoun. We even have a query mark (“?”) on the finish. We are able to divide any sentence or phrase like this into many grammatical elements in JavaScript. One other means we will distinguish these components is to divide them into small tokens. This system that divides the textual content into tokens is our Lexer.

Diagram showing command going through a lexer.

Since our language may be very tiny, it solely has two sorts of tokens, every with a price:

  1. key phrase
  2. string

We may’ve used an everyday expression to extract tokes from the codes string however the efficiency can be very gradual. A greater strategy is to loop by way of every character of the code string and seize tokens. So, let’s create a tokenize technique in our Magenta class — which can be our Lexer.

Full code
class Magenta {
  constructor(codes) {
    this.codes = codes
  }
  tokenize() {
    const size = this.codes.size
    // pos retains observe of present place/index
    let pos = 0
    let tokens = []
    const BUILT_IN_KEYWORDS = ["print"]
    // allowed characters for variable/key phrase
    const varChars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
    whereas (pos < size) {
      let currentChar = this.codes[pos]
      // if present char is house or newline,  proceed
      if (currentChar === " " || currentChar === "n") {
        pos++
        proceed
      } else if (currentChar === '"') {
        // if present char is " then we've got a string
        let res = ""
        pos++
        // whereas subsequent char just isn't " or n and we aren't on the finish of the code
        whereas (this.codes[pos] !== '"' && this.codes[pos] !== 'n' && pos < size) {
          // including the char to the string
          res += this.codes[pos]
          pos++
        }
        // if the loop ended due to the tip of the code and we did not discover the closing "
        if (this.codes[pos] !== '"') {
          return {
            error: `Unterminated string`
          }
        }
        pos++
        // including the string to the tokens
        tokens.push({
          sort: "string",
          worth: res
        })
      } else if (varChars.contains(currentChar)) { arater
        let res = currentChar
        pos++
        // whereas the subsequent char is a sound variable/key phrase charater
        whereas (varChars.contains(this.codes[pos]) && pos < size) {
          // including the char to the string
          res += this.codes[pos]
          pos++
        }
        // if the key phrase just isn't a in-built key phrase
        if (!BUILT_IN_KEYWORDS.contains(res)) {
          return {
            error: `Sudden token ${res}`
          }
        }
        // including the key phrase to the tokens
        tokens.push({
          sort: "key phrase",
          worth: res
        })
      } else { // we've got a invalid character in our code
        return {
          error: `Sudden character ${this.codes[pos]}`
        }
      }
    }
    // returning the tokens
    return {
      error: false,
      tokens
    }
  }
  run() {
    const {
      tokens,
      error
    } = this.tokenize()
    if (error) {
      console.log(error)
      return
    }
    console.log(tokens)
  }
}

If we run this in a terminal with node index.js, we must always see an inventory of tokens printed within the console.

Screenshot of code.
Nice stuff!

Defining guidelines and syntaxes

We need to see if the order of our codes matches some type of rule or syntax. However first we have to outline what these guidelines and syntaxes are. Since our language is so tiny, it solely has one easy syntax which is a print key phrase adopted by a string.

key phrase:print string

So let’s create a parse technique that loops by way of our tokens and see if we’ve got a sound syntax fashioned. If that’s the case, it can take vital actions.

class Magenta {
  constructor(codes) {
    this.codes = codes
  }
  tokenize(){
    /* earlier codes for tokenizer */
  }
  parse(tokens){
    const len = tokens.size
    let pos = 0
    whereas(pos < len) {
      const token = tokens[pos]
      // if token is a print key phrase
      if(token.sort === "key phrase" && token.worth === "print") {
        // if the subsequent token would not exist
        if(!tokens[pos + 1]) {
          return console.log("Sudden finish of line, anticipated string")
        }
        // examine if the subsequent token is a string
        let isString = tokens[pos + 1].sort === "string"
        // if the subsequent token just isn't a string
        if(!isString) {
          return console.log(`Sudden token ${tokens[pos + 1].sort}, anticipated string`)
        }
        // if we attain this level, we've got legitimate syntax
        // so we will print the string
        console.log('x1b[35m%sx1b[0m', tokens[pos + 1].worth)
        // we add 2 as a result of we additionally examine the token after print key phrase
        pos += 2
      } else{ // if we did not match any guidelines
        return console.log(`Sudden token ${token.sort}`)
      }
    }
  }
  run(){
    const {tokens, error} = this.tokenize()
    if(error){
      console.log(error)
      return
    }
    this.parse(tokens)
  }
}

And would you have a look at that — we have already got a working language!

Screenshot of terminal output.

Okay however having codes in a string variable just isn’t that enjoyable. So lets put our Magenta codes in a file referred to as code.m. That means we will hold our magenta codes separate from the compiler logic. We’re utilizing .m as file extension to point that this file incorporates code for our language.

Let’s learn the code from that file:

// importing file system module
const fs = require('fs')
//importing path module for handy path becoming a member of
const path = require('path')
class Magenta{
  constructor(codes){
    this.codes = codes
  }
  tokenize(){
    /* earlier codes for tokenizer */
 }
  parse(tokens){
    /* earlier codes for parse technique */
 }
  run(){
    /* earlier codes for run technique */
  }
}

// Studying code.m file
// Some textual content editors use rn for brand new line as a substitute of n, so we're eradicating r
const codes = fs.readFileSync(path.be a part of(__dirname, 'code.m'), 'utf8').toString().exchange(/r/g, &quot;&quot;)
const magenta = new Magenta(codes)
magenta.run()

Go create a programming language!

And with that, we’ve got efficiently created a tiny Programming Language from scratch. See, a programming language may be so simple as one thing that accomplishes one particular factor. Certain, it’s unlikely {that a} language like Magenta right here will ever be helpful sufficient to be a part of a well-liked framework or something, however now you see what it takes to make one.

The sky is de facto the restrict. In order for you dive in a little bit deeper, strive following together with this video I made going over a extra superior instance. That is video I’ve additionally proven hoe you possibly can add variables to your language additionally.

Add a Comment

Your email address will not be published. Required fields are marked *