This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 15k traffic Daily!!!

Devanagari Transliteration Pipeline for LaTeX


Devanagari Transliteration in LaTeX — Write in Devanagari to render as IAST, Harvard-Kyoto, Velthuis, SLP1, WX and so forth.

Write in Devanagari to render as IAST, Harvard-Kyoto, Velthuis, SLP1, WX and so forth.

Devanagari textual content could be transliterated in varied standard schemes. There exist a number of enter methods based mostly on these transliteration schemes to allow customers simply enter the textual content. Most of the time, a person has a choice of scheme to kind the enter in. Equally, at occasions, one faces a must render it in a unique scheme within the PDF doc.

In my case, I favor utilizing ibus-m17n to kind textual content in Devanagari. Whereas writing articles that comprise Devanagari textual content, I additionally confronted the necessity to render the textual content as IAST within the remaining PDF
One might at all times be taught to enter textual content in one other enter scheme, however which will get tedious. Equally, transliterating every phrase utilizing on-line methods akin to Aksharamukha will also be a tedious job. So, I used to be in search of a approach…

Devanagari is the fourth most generally adopted writing system on the planet, primarily used within the Indian subcontinent. The script is getting used for greater than 120 languages, a few of the extra notable languages being, Sanskrit, Hindi, Marathi, Pali, Nepali and a number of other variations of those languages.

Devanagari textual content could be transliterated in varied standard schemes. There exist a number of enter methods based mostly on these transliteration schemes to allow customers simply enter the textual content. Most of the time, a person has a choice of scheme to kind the enter in. Equally, at occasions, one faces a must render it in a unique scheme within the PDF doc.

In my case, I favor utilizing ibus-m17n to kind textual content in Devanagari. Whereas writing articles that comprise Devanagari textual content, I additionally confronted the necessity to render the textual content as IAST within the remaining PDF.
One might at all times be taught to enter textual content in one other enter scheme, however which will get tedious. Equally, transliterating every phrase utilizing on-line methods akin to Aksharamukha will also be a tedious job. So, I used to be in search of a approach the place I can kind in Devanagari, and have it rendered in IAST after PDF compilation. As an answer, I got here up with a system consisting of a small set of LaTeX instructions so as to add customized syntax to LaTeX and a python transliteration script (based mostly on indic-transliteration package deal) to function a middle-layer and course of the LaTeX file to create a brand new LaTeX file with correct transliteration.



LaTeX Compilation System with Transliteration Help

There are two major elements to the system,

  1. LaTeX Synatx
  2. Transliteration Script



LaTeX Syntax

XeTeX (xelatex) and LuaTeX (lualatex) have good unicode help and can be utilized to put in writing Devanagari textual content. Within the present instance, I point out the setup with XeTeX.

We first add the required packages within the preamble of the LaTeX (.tex) file.

% This assumes your recordsdata are encoded as UTF8
usepackage[utf8]{inputenc}

% Devanagari Associated Packages
usepackage{fontspec, xunicode, xltxtra}
Enter fullscreen mode

Exit fullscreen mode

Utilizing fontspec, we will outline environments for font households, to put in writing textual content in particular scripts. To write down Devanagari textual content, one must have a Devanagari font obtainable. (It’s assumed right here that one might have to put in writing each in Devanagari in addition to different transliteration schemes.)

For extra on Devanagari fonts, you might verify the fonts section of this doc. On this part, it’s assumed that Sanskrit 2003 font is put in within the system.

To outline the environments as talked about earlier, we add the next strains within the preamble.

% Outline Fonts
newfontfamilytextskt[Script=Devanagari]{Sanskrit 2003}
newfontfamilytextiast[Script=Latin]{Sanskrit 2003}

% Instructions for Devanagari Transliterations
newcommand{skt}[1]{{textskt{#1}}}
newcommand{iast}[1]{{textiast{#1}}}
newcommand{Iast}[1]{{textiast{#1}}}
newcommand{IAST}[1]{{textiast{#1}}}
Enter fullscreen mode

Exit fullscreen mode

This supplies us with 4 instructions. skt{} can be utilized to render Devanagari textual content. iast{}, Iast{} and IAST{} can be utilized to render devanagari textual content in IAST format in decrease case, title case and higher case respectively. It must be famous that from the attitude of LaTeX engine, the instructions iast{}, Iast{} and IAST{} are similar. They’re simply completely different syntactically to assist the python script to carry out transliteration and apply applicable modifications.
It ought to additional be famous that we will outline new font households and new instructions for any of the legitimate schemes as per the requirement, which may probably give us further instructions such velthuis{}, hk{} and so forth.



Minimal Instance

Geared up with these instructions, and a few Devanagari textual content, we’ve got a minimal instance as follows, saved within the file minimal.tex,

documentclass[10pt]{article}

% This assumes your recordsdata are encoded as UTF8
usepackage[utf8]{inputenc}

% Devanagari Associated Packages
usepackage{fontspec, xunicode, xltxtra}

% Outline Fonts
newfontfamilytextskt[Script=Devanagari]{Sanskrit 2003}
newfontfamilytextiast[Script=Latin]{Sanskrit 2003}

% Instructions for Devanagari Transliterations
newcommand{skt}[1]{{textskt{#1}}}
newcommand{iast}[1]{{textiast{#1}}}
newcommand{Iast}[1]{{textiast{#1}}}
newcommand{IAST}[1]{{textiast{#1}}}

title{Transliteration of Devanagari Textual content}
creator{Hrishikesh Terdalkar}

start{doc}

maketitle

skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

Iast{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

IAST{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

finish{doc}
Enter fullscreen mode

Exit fullscreen mode



Transliteration Script

The python script is used to carry out transliteration and a few clean-up on the LaTeX.

python3 finalize.py minimal.tex remaining.tex
Enter fullscreen mode

Exit fullscreen mode

This end result within the content material being remodeled within the following approach,

% ...

skt{को न्वस्मिन् साम्प्रतं लोके गुणवान् कश्च वीर्यवान्।}

iast

Iast

IAST

% ...
Enter fullscreen mode

Exit fullscreen mode

We will now proceed to compile the remaining.tex file.

xelatex remaining
Enter fullscreen mode

Exit fullscreen mode

This ends in the next output, PDF



Anatomy of the Transliteration Script

On the core of the transliteration script, there’s a operate transliterate_between.

def transliterate_between(
    textual content: str,
    from_scheme: str,
    to_scheme: str,
    start_pattern: str,
    end_pattern: str,
    post_hook: Callable[[str], str] = lambda x: x,
) -> str:
    """Transliterate the textual content showing between two patterns

    Solely the textual content showing between patterns `start_pattern` and `end_pattern`
    it transliterated.
    `start_pattern` and `end_pattern` can seem a number of occasions within the full
    textual content, and for each prevalence, the textual content between them is transliterated.

    `from_scheme` and `to_scheme` must be appropriate with scheme names from
    `indic-transliteration`

    Parameters
    ----------
    textual content : str
        Full textual content
    from_scheme : str
        Enter transliteration scheme
    to_scheme : str
        Output transliteration scheme
    start_pattern : regexp
        Sample describing the beginning tag
    end_pattern : regexp
        Sample describing the top tag
    post_hook : Callable[[str], str], non-compulsory
        Operate to be utilized on the textual content inside tags after transliteration
        The default is `lambda x: x`.

    Returns
    -------
    str
        Textual content after replacements
    """

    if from_scheme == to_scheme:
        return textual content

    def transliterate_match(matchobj):
        goal = matchobj.group(1)
        substitute = transliterate(goal, from_scheme, to_scheme)
        substitute = post_hook(substitute)
        return f"{start_pattern}{substitute}{end_pattern}"

    sample = "%s(.*?)%s" % (re.escape(start_pattern), re.escape(end_pattern))
    return re.sub(sample, transliterate_match, textual content, flags=re.DOTALL)
Enter fullscreen mode

Exit fullscreen mode

We will present the beginning and finish patterns as iast{ and } respsectively, to transliterate the textual content enclosed in these tags.

Utilizing this operate, we will write a generic operate to work with any transliteration scheme.

def latex_transliteration(
    input_text: str,
    from_scheme: str,
    to_scheme: str
) -> str:
    """Transliaterate elements of the LaTeX enter enclosed in scheme tags

    A scheme tag is of the shape `to_scheme_lowercase{}` and is used
    when the specified output is in `to_scheme`.

    i.e.,
    - Tags for IAST scheme are enclosed in iast{} tags
    - Tags for VH scheme are enclosed in vh{} tags
    - ...

    Parameters
    ----------
    input_text : str
        Enter textual content
    from_scheme : str
        Transliteration scheme of the textual content written inside the enter tags
    to_scheme : str
        Transliteration scheme to which the textual content inside tags must be
        transliterated

    Returns
    -------
    str
        Textual content after substitute of textual content inside the scheme tags
    """
    start_tag_pattern = f"{to_scheme.decrease()}"
    end_tag_pattern = "}"
    return transliterate_between(
        input_text,
        from_scheme=from_scheme,
        to_scheme=to_scheme,
        start_pattern=start_tag_pattern,
        end_pattern=end_tag_pattern
    )
Enter fullscreen mode

Exit fullscreen mode

Observe: The names of schemes (and due to this fact the corresponding LaTeX instructions) have to adapt to the names of schemes used
by the indic-transliteration package deal.

IAST is a case-insensitive transliteration scheme, and as such, we may be interested by particular capitalization of sure phrases (e.g. correct nouns). We will use the post_hook argument to supply this operate. Utilizing that, we will create a operate to deal with the three variants of IAST talked about beforehand, particularly, iast{} (decrease), Iast{} (title) and IAST{} (higher).

def devanagari_to_iast(input_text: str) -> str:
    """Transliaterate elements of the enter enclosed in
    iast{}, Iast{} or IAST{} tags from Devanagari to IAST

    Textual content in Iast{} tags additionally undergoes a `.title()` post-hook.
    Textual content in IAST{} tags additionally undergoes a `.higher()` post-hook.

    Parameters
    ----------
    input_text : str
        Enter textual content

    Returns
    -------
    str
        Textual content after substitute of textual content inside the IAST tags
    """
    intermediate_text = transliterate_between(
        input_text,
        from_scheme=sanscript.DEVANAGARI,
        to_scheme=sanscript.IAST,
        start_pattern="iast{",
        end_pattern="}"
    )
    intermediate_text = transliterate_between(
        intermediate_text,
        from_scheme=sanscript.DEVANAGARI,
        to_scheme=sanscript.IAST,
        start_pattern="Iast{",
        end_pattern="}",
        post_hook=lambda x: x.title()
    )
    final_text = transliterate_between(
        intermediate_text,
        from_scheme=sanscript.DEVANAGARI,
        to_scheme=sanscript.IAST,
        start_pattern="IAST{",
        end_pattern="}",
        post_hook=lambda x: x.higher()
    )

    return final_text
Enter fullscreen mode

Exit fullscreen mode

Lastly, there are different utility capabilities to take away feedback and clear extreme whitespaces.



Extras

Moreover, we might want some extra construction to our setup, akin to,

  • Separation of ontent into a number of recordsdata
enter{sections/part_devanagari.tex}
enter{sections/part_iast_decrease.tex}
enter{sections/part_iast_title.tex}
enter{sections/part_iast_higher.tex}
Enter fullscreen mode

Exit fullscreen mode

bibliographystyle{acm}
bibliography{papers}
Enter fullscreen mode

Exit fullscreen mode



Remaining LaTeX Preparation

We might have used the scheme tags throughout a number of sections. One possibility is to use the transliteration script on each part file, to create a brand new set of part recordsdata and use these to compile the ultimate LaTeX file.

An easier answer is obtainable within the type of latexpand which resolves the enter{} instructions to really embrace the content material and create a single consolidated LaTeX file.

latexpand principal.tex > single.tex
Enter fullscreen mode

Exit fullscreen mode

Now, we will run the python script on this file to resolve the transliteration tags.

python3 finalize.py principal.tex remaining.tex
Enter fullscreen mode

Exit fullscreen mode



Compilation

When working with BibTeX, we regularly must a number of occasions to get the right rendering of references within the PDF. Normally, this requires

xelatex remaining
bibtex remaining
xelatex remaining
xelatex remaining
Enter fullscreen mode

Exit fullscreen mode

Alternatively, we will use latexmk which takes care of the tedious compilation routines and reduces our job to a single command,

latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- remaining.tex
Enter fullscreen mode

Exit fullscreen mode

One other advantage of utilizing latexmk is, we will clear the quite a few recordsdata generated by LaTeX engine utilizing a one-liner as properly,

latexmk -c
Enter fullscreen mode

Exit fullscreen mode



Makefile

Lastly, we will place all the console instructions collectively in a Makefile.

all: .all

.all: principal.tex sections/*.tex papers.bib
        latexpand principal.tex > single.tex
        python3 finalize.py single.tex remaining.tex

        latexmk -pdflatex='xelatex %O %S' -pdf -ps- -dvi- remaining.tex

clear:
        latexmk -C
        rm single.tex
        rm remaining.tex

clear:
        latexmk -c
Enter fullscreen mode

Exit fullscreen mode

Thus, now we will deal with writing content material within the .tex recordsdata and as soon as we’re executed, merely use the command,

make
Enter fullscreen mode

Exit fullscreen mode



Necessities

We’ve made use of plenty of exterior instruments, and it’s required to have these setup previous to the described answer.



Minimal Necessities

The minimal example talked about earlier requires solely three issues,



Further Necessities

The extras have some extra dependencies.

  • BibTeX (non-compulsory) (bibliography help)
  • latexpand (non-compulsory) (resolve enter{})
  • latexmk (non-compulsory) (easier TeX compilation)



Devanagari Fonts

These days, there are a number of good Devanagari fonts obtainable. Google Fonts additionally supplies a wide variety of Devanagari fonts.

Two of my private favourites are,



Code

The supply code for the complete setup is obtainable at hrishikeshrt/devanagari-transliteration-latex.

The Article was Inspired from tech community site.
Contact us if this is inspired from your article and we will give you credit for it for serving the community.

This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 10k Tech related traffic daily !!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Want to Contribute to us or want to have 15k+ Audience read your Article ? Or Just want to make a strong Backlink?