This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 15k traffic Daily!!!

Transcribing Communications Recordings with Deepgram

On this digital age the place digital conferences are a dime a dozen, we see a lot of them recorded for future data. There are lots of makes use of for these data, together with sharing with individuals who had been unable to attend dwell, distributing to be used as coaching, and maintaining backups for future reference. One facet of those recordings that’s taken without any consideration, nevertheless, is accessibility. On this weblog, we’ll exhibit how you can take recordings out of your Communications conferences, and use Deepgram to transcribe them to textual content.

Having textual content copies of your convention recordings is an efficient option to supply alternative routes to digest the data. Some individuals learn sooner than they take heed to spoken phrases. Some individuals may not converse the identical first language because the one within the convention, and are extra snug studying it. Others is perhaps listening to impaired, and like to learn for essentially the most quantity of consolation. No matter purpose one might need, we need to make it easy to automate the transcription technology course of. Right here, we will probably be utilizing the Communications REST APIs in tandem with Deepgram’s Pre-recorded Audio API in Python for example of how you can generate this course of.

Putting in Libraries

Earlier than we start coding, we have to guarantee we now have all the correct libraries for calling these APIs. We are able to do that with a easy pip command (use the suitable pip command on your working system):

pip3 set up asyncio deepgram-sdk dolbyio-rest-apis
Enter fullscreen mode

Exit fullscreen mode

It will set up each the and Deepgram SDKs, in addition to Python’s native asynchronous operate library to assist us in calling the async requests the 2 SDKs use.

It’s also a good suggestion to join a free and Deepgram account should you haven’t already, to get your API credentials.

Acquiring an API Token

With a purpose to use the Communications REST APIs, we have to first generate a brief entry token. That is to assist stop your everlasting account credentials from being unintentionally leaked, because the token will expire robotically. To be taught extra about this, learn the documentation. On this case, we need to fill within the client key and secret with our credentials from our Communications APIs (not Media). We then name the get_api_access_token endpoint inside a operate so we will generate a recent token each time we make one other name. This isn’t essentially the most safe option to deal with this, however will guarantee we don’t run into any expired credentials down the highway. To be taught extra, see our security best practices guide.

from dolbyio_rest_apis.communications import authentication
import asyncio

# Enter your Communications Credentials right here

# Create a operate that may generate a brand new api entry token when wanted
async def gen_token():
    response = await authentication.get_api_access_token(CONSUMER_KEY, CONSUMER_SECRET)
    return response['access_token']

print(f"Entry Token: {await gen_token()}")
Enter fullscreen mode

Exit fullscreen mode

Getting the Convention ID

Now that we will name the APIs, we first need to get the interior convention ID of the recording we need to transcribe. We are able to do that by merely calling the get_conferences endpoint with our token.

from dolbyio_rest_apis.communications.monitor import conferences

response = await conferences.get_conferences(await gen_token())
# Save the newest convention. Change '-1' to whichever convention you need.
confId = response['conferences'][-1]['confId']
Enter fullscreen mode

Exit fullscreen mode

Observe that on this code pattern, we’re utilizing the parameter: ['conferences'][-1]['confId']. It will pull solely the newest convention within the record as famous by the “-1” array worth. In case you are automating this to work with each newly generated convention, this probably is not going to be a problem. Nonetheless if you’re wanting to do that with a selected convention, we propose utilizing the optional parameters in the get_conferences endpoint to acquire the specified convention ID.

Acquiring the Recording

With the convention ID in hand, we will now name an endpoint to generate a URL that comprises the audio file of our convention. For this code pattern, we’re utilizing a Dolby Voice convention, so we’ll use the endpoint to Get the Dolby Voice audio recording. If you already know you’re not utilizing Dolby Voice, you need to use this endpoint as a substitute. Observe that we’re solely acquiring the audio monitor of the convention as a substitute of each the audio and the video. That is for max file compatibility with the transcription software program. Observe that the URL produced can also be short-term, and can expire after a while.

from dolbyio_rest_apis.communications.monitor import recordings

# Save solely the mp3 file and return as a URL.
# In case your convention doesn't use Dolby Voice, use 'download_mp3_recording' as a substitute.
response = await recordings.get_dolby_voice_recordings(await gen_token(), confId)
recording_url = response['url']
Enter fullscreen mode

Exit fullscreen mode

To assist illustrate, here is an example conference recording made for transcription generated from the above code.

Transcoding it with Deepgram

Whereas Deepgram does work with native recordsdata, the presigned recording url saves us many steps avoiding the effort of needing to obtain and add a file to a safe server. With the URL, we will skip these steps and immediately insert the URL into the code beneath tailored from their Python Getting Started Guide. The code supplied solely makes use of the Punctuation feature, however may simply expanded with an assortment of the many features Deepgram provides.

from deepgram import Deepgram

# Your Deepgram API Key

# Location of the file you need to transcribe. Ought to embrace filename and extension.
FILE = recording_url

async def predominant():

  # Initialize the Deepgram SDK
  deepgram = Deepgram(DEEPGRAM_API_KEY)

  # file is distant
  # Set the supply
  supply = {
    'url': FILE

  # Ship the audio to Deepgram and get the response
  response = await asyncio.create_task(
        'punctuate': True

  # Write solely the transcript to the console

  await predominant()
  # If not working in a Jupyter pocket book, run predominant with this line as a substitute:
besides Exception as e:
  exception_type, exception_object, exception_traceback = sys.exc_info()
  line_number = exception_traceback.tb_lineno
  print(f'line {line_number}: {exception_type} - {e}')
Enter fullscreen mode

Exit fullscreen mode

The Deepgram response supplies many datapoints associated to our speech, however to tug solely the transcription of the file, we’re calling ['results']['channels'][0]['alternatives'][0]['transcript']. Be at liberty to change the response to generate no matter is most related to your wants. For the above pattern supplied, the results of the transcription is as follows:

Following textual content is a transcription of the s en of the parchment declaration of independence. The doc on show within the rot the nationwide archives Museum. The spelling and punctuation displays the originals.

Subsequent Steps

This can be a very primary foray in how you can get began with transcribing your convention recordings. We closely recommend you make investments a while into increasing this to suit your particular use case to maximise the profit you get from utilizing these instruments.

As talked about earlier than, we propose having a look at what Deepgram has to supply by way of extra options you could possibly add on to the transcription course of. For instance:

  • Diarization may help differentiate who’s saying what when there are a number of individuals in a convention.
  • Named Entity Recognition and/or Keywords to assist enhance accuracy by offering prior info of issues like names and correct nouns.

The transcription of the instance recording was not excellent. There are lots of causes for this, together with imperfect recording environments, complicated speech patterns, and compression as examples. To assist in giving the transcription algorithms a greater probability, one choice could possibly be to make use of the Media Enhance API to aim to scrub up the audio earlier than sending it to transcription.

If you wish to robotically generate a transcription after each recording is over, we will make the most of webhooks to take away the guide intervention for you. The truth is, the Recording.Audio.Available event supplies the recording URL throughout the occasion physique itself, decreasing the variety of steps wanted to acquire it.

One closing concept is should you do solely have the video file prepared for no matter purpose, you need to use the Media Transcode API to transform the video file right into a format accepted by the transcription service.

You’ll find the supply code file saved in a Jupyter pocket book at this GitHub repository. When you run into any points, don’t hesitate to contact our support team for assist, and good luck coding!

The Article was Inspired from tech community site.
Contact us if this is inspired from your article and we will give you credit for it for serving the community.

This Banner is For Sale !!
Get your ad here for a week in 20$ only and get upto 10k Tech related traffic daily !!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Want to Contribute to us or want to have 15k+ Audience read your Article ? Or Just want to make a strong Backlink?