By now, you might be in all probability accustomed to a number of programming languages. However have you ever ever puzzled how you may create your individual programming language? And by that, I imply:
A programming language is any algorithm that convert strings to varied sorts of machine code output.
Briefly, a programming language is only a set of predefined guidelines. And to make them helpful, you want one thing that understands these guidelines. And people issues are compilers, interpreters, and many others. So we will merely outline some guidelines, then, to make it work, we will use any present programming language to make a program that may perceive these guidelines, which can be our interpreter.
Compiler
A compiler converts codes into machine code that the processor can execute (e.g. C++ compiler).
Interpreter
An interpreter goes by way of this system line by line and executes every command.
Wish to give it a strive? Let’s create an excellent easy programming language collectively that outputs magenta-colored output within the console. We’ll name it Magenta.

Establishing our programming language
I’m going to make use of Node.js however you need to use any language to observe alongside, the idea will stay the identical. Let me begin by creating an index.js
file and set issues up.
class Magenta {
constructor(codes) {
this.codes = codes
}
run() {
console.log(this.codes)
}
}
// For now, we're storing codes in a string variable referred to as `codes`
// Later, we are going to learn codes from a file
const codes =
`print "whats up world"
print "whats up once more"`
const magenta = new Magenta(codes)
magenta.run()
What we’re doing right here is declaring a category referred to as Magenta
. That class defines and initiates an object that’s chargeable for logging textual content to the console with no matter textual content we offer it by way of a codes
variable. And, in the interim, we’ve outlined that codes
variable instantly within the file with a few “whats up” messages.

OK, now we have to create a what’s referred to as a Lexer.
What’s a Lexer?
OK, let’s talks concerning the English language for a second. Take the next phrase:
How are you?
Right here, “How” is an adverb, “are” is a verb, and “you” is a pronoun. We even have a query mark (“?”) on the finish. We are able to divide any sentence or phrase like this into many grammatical elements in JavaScript. One other means we will distinguish these components is to divide them into small tokens. This system that divides the textual content into tokens is our Lexer.

Since our language may be very tiny, it solely has two sorts of tokens, every with a price:
key phrase
string
We may’ve used an everyday expression to extract tokes from the codes
string however the efficiency can be very gradual. A greater strategy is to loop by way of every character of the code
string and seize tokens. So, let’s create a tokenize
technique in our Magenta
class — which can be our Lexer.
Full code
class Magenta {
constructor(codes) {
this.codes = codes
}
tokenize() {
const size = this.codes.size
// pos retains observe of present place/index
let pos = 0
let tokens = []
const BUILT_IN_KEYWORDS = ["print"]
// allowed characters for variable/key phrase
const varChars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
whereas (pos < size) {
let currentChar = this.codes[pos]
// if present char is house or newline, proceed
if (currentChar === " " || currentChar === "n") {
pos++
proceed
} else if (currentChar === '"') {
// if present char is " then we've got a string
let res = ""
pos++
// whereas subsequent char just isn't " or n and we aren't on the finish of the code
whereas (this.codes[pos] !== '"' && this.codes[pos] !== 'n' && pos < size) {
// including the char to the string
res += this.codes[pos]
pos++
}
// if the loop ended due to the tip of the code and we did not discover the closing "
if (this.codes[pos] !== '"') {
return {
error: `Unterminated string`
}
}
pos++
// including the string to the tokens
tokens.push({
sort: "string",
worth: res
})
} else if (varChars.contains(currentChar)) { arater
let res = currentChar
pos++
// whereas the subsequent char is a sound variable/key phrase charater
whereas (varChars.contains(this.codes[pos]) && pos < size) {
// including the char to the string
res += this.codes[pos]
pos++
}
// if the key phrase just isn't a in-built key phrase
if (!BUILT_IN_KEYWORDS.contains(res)) {
return {
error: `Sudden token ${res}`
}
}
// including the key phrase to the tokens
tokens.push({
sort: "key phrase",
worth: res
})
} else { // we've got a invalid character in our code
return {
error: `Sudden character ${this.codes[pos]}`
}
}
}
// returning the tokens
return {
error: false,
tokens
}
}
run() {
const {
tokens,
error
} = this.tokenize()
if (error) {
console.log(error)
return
}
console.log(tokens)
}
}
If we run this in a terminal with node index.js
, we must always see an inventory of tokens printed within the console.

Defining guidelines and syntaxes
We need to see if the order of our codes matches some type of rule or syntax. However first we have to outline what these guidelines and syntaxes are. Since our language is so tiny, it solely has one easy syntax which is a print
key phrase adopted by a string.
key phrase:print string
So let’s create a parse
technique that loops by way of our tokens and see if we’ve got a sound syntax fashioned. If that’s the case, it can take vital actions.
class Magenta {
constructor(codes) {
this.codes = codes
}
tokenize(){
/* earlier codes for tokenizer */
}
parse(tokens){
const len = tokens.size
let pos = 0
whereas(pos < len) {
const token = tokens[pos]
// if token is a print key phrase
if(token.sort === "key phrase" && token.worth === "print") {
// if the subsequent token would not exist
if(!tokens[pos + 1]) {
return console.log("Sudden finish of line, anticipated string")
}
// examine if the subsequent token is a string
let isString = tokens[pos + 1].sort === "string"
// if the subsequent token just isn't a string
if(!isString) {
return console.log(`Sudden token ${tokens[pos + 1].sort}, anticipated string`)
}
// if we attain this level, we've got legitimate syntax
// so we will print the string
console.log('x1b[35m%sx1b[0m', tokens[pos + 1].worth)
// we add 2 as a result of we additionally examine the token after print key phrase
pos += 2
} else{ // if we did not match any guidelines
return console.log(`Sudden token ${token.sort}`)
}
}
}
run(){
const {tokens, error} = this.tokenize()
if(error){
console.log(error)
return
}
this.parse(tokens)
}
}
And would you have a look at that — we have already got a working language!

Okay however having codes in a string variable just isn’t that enjoyable. So lets put our Magenta codes in a file referred to as code.m
. That means we will hold our magenta codes separate from the compiler logic. We’re utilizing .m
as file extension to point that this file incorporates code for our language.
Let’s learn the code from that file:
// importing file system module
const fs = require('fs')
//importing path module for handy path becoming a member of
const path = require('path')
class Magenta{
constructor(codes){
this.codes = codes
}
tokenize(){
/* earlier codes for tokenizer */
}
parse(tokens){
/* earlier codes for parse technique */
}
run(){
/* earlier codes for run technique */
}
}
// Studying code.m file
// Some textual content editors use rn for brand new line as a substitute of n, so we're eradicating r
const codes = fs.readFileSync(path.be a part of(__dirname, 'code.m'), 'utf8').toString().exchange(/r/g, "")
const magenta = new Magenta(codes)
magenta.run()
Go create a programming language!
And with that, we’ve got efficiently created a tiny Programming Language from scratch. See, a programming language may be so simple as one thing that accomplishes one particular factor. Certain, it’s unlikely {that a} language like Magenta right here will ever be helpful sufficient to be a part of a well-liked framework or something, however now you see what it takes to make one.
The sky is de facto the restrict. In order for you dive in a little bit deeper, strive following together with this video I made going over a extra superior instance. That is video I’ve additionally proven hoe you possibly can add variables to your language additionally.