Select Page

Amazon Transcribe API - a Step-by-Step Tutorial

12 Minute Read · October 21, 2018 · by Derek Pankaew

Amazon Transcribe is a Speech-to-Text API by Amazon.  Built using the same engine that powers Alexa, Amazon’s speech recognition technology is better than almost everyone except Google’s. 

This tutorial will show you how to use Amazon Transcribe, from start to finish, assuming no prior knowledge of AWS. If you already have AWS setup, including credentials, then skip to Part VI below to dive directly into the speech-to-text implementation.

Without further ado, let’s get started!

Introduction to Amazon Transcribe

Amazon Transcribe is priced at $1.44 per minute, which is exactly the same as Google Speech. In terms of accuracy, Google Speech is slightly higher. They both have key features like punctuation, timestamps, and speaker differentiation.

The main difference between Amazon Transcribe and Google / IBM is that Amazon Transcribe is asyncronous. You can upload hundreds of files to Amazon Transcribe and initiate transcription jobs at the same time. With Google and IBM, you have to transcribe sequentially. Note that audio files must be uploaded to Amazon S3 before transcription.

In short, if you’re uploading large batches of files, Amazon Transcribe is better suited than Google Speech. On the other hand, if you want to use streaming audio, or if speed is important, then Google Speech is better. 

Another reason to use Amazon Transcribe over Google or IBM is to stay within Amazon’s ecosystem. If you’re already using AWS, S3, EC2, or other Amazon services, it may be easier to stay in the same ecosystem and use Amazon Transcribe.

Let’s take a look at how to get this setup.

Setup Step 1: Login to Your AWS Console

Using Amazon Transcribe requires an AWS account. Sign up or login here:

https://console.aws.amazon.com/console/

Setup Step 2: Create an S3 Bucket

Create a bucket to put your audio files. Amazon Transcribe only works from within Amazon S3. In other words, transcription is a 2-step process – first, upload. Then, transcribe.

To create an S3 bucket, just search for “S3” in the AWS search bar.

Once inside the S3 interface, click “Create Bucket.”

Follow the on-screen instructions to create the bucket.

Setup Step 3: Upload Your Audio to S3

Unlike Google Speech, Amazon Transcribe does not require you to pre-process your audio. You can upload as mp3 or wav, in whatever compression you want.

For the purposes of this tutorial, we’ll just upload the file through the AWS interface. To use it programmatically, you’ll want to use the Amazon S3 API in the future.

Setup Step 4: Setup User, Keys, and AWS Credentials

If it’s your first time using AWS, you’ll need to download your credentials. Skip this step if you already have credetials setup. The documentation here goes through this in more detail.

To create your credentials, use Amazon’s Identity and Access Management (IAM) system, located here:

https://console.aws.amazon.com/iam/

Once you’re logged in, create a new user and make sure to check “Programmatic Access”.

When asked to add user permissions, make sure to add both Amazon Transcribe and S3 permissions to the user:

Once the user is created, you’ll now have your Key ID and Secret Access Key.

Now, add these keys to your AWS credentials folder:
Mac and Linux: ~./aws/credentials
Windows: C:\Users\USERNAME\.aws\credentials

For example, here’s how I added mine on Mac:

Dereks-Macbook-Pro:~ derekp$ mkdir .aws
Dereks-Macbook-Pro:~ derekp$ cd .aws
Dereks-Macbook-Pro:.aws derekp$ cat > credentials
[default]
aws_access_key_id = ABISIIKQBG00312TGPAKQQ
aws_secret_access_key = BeSNLAWY5p98AmA8HMu22RVMbuAHcUk9lcKkUk

Setup Step 5: Download the AWS SDK

Download the SDK for the language you’re using. You can download using pip / gem / npm, or via .zip file download. More details here:

https://aws.amazon.com/tools/#sdk

We’ll proceed with NodeJS / Javascript. So start with npm:

npm install --save aws-sdk

If you’re using Python, you can install using boto3. Documentation here. Using pip:

pip install boto3

Now that we’re all setup, we’re finally ready to dive into the Amazon Transcribe API.

How Amazon Transcribe’s API Works

The paradigm behind Amazon Transcribe is different than Google Speech or IBM Watson. Both of the former expect you to upload a file and wait for a transcription to be returned. They also support streaming.

On the other hand, with Amazon Transcribe, you upload your files to S3 and schedule them for transcription. Everything goes into a queue, which you can check at a later time. The advantage of this approach is you can schedule hundreds or thousands of jobs without waiting for a response first.

Once a speech-to-text request has started, it takes about 1x the time of the audio to finish. You can then use the API to list or fetch all your completed transcripts.

There are 3 primary commands you need to know to use the API:

1. Create a new transcription job,
2. List existing transcription jobs,
3. Fetch the results of a completed transcription job

Let’s take a look at all three.

Creating a Transcription Job

Start by creating importing the AWS SDK and the transcriber service.

const AWS = require('aws-sdk')
const transcriber = new AWS.TranscribeService()

Next, we setup our parameters. Amazon has several other options you can configure, available here. For now, we’ll just use the required parameters:

var params = {
    LanguageCode: "en-US",
    Media: {
        MediaFileUri: 'https://s3.amazonaws.com/speechtotext91234/5mins.wav' 
},
MediaFormat: "wav",
TranscriptionJobName: '5MinuteTest'
};

The MediaFileUri field must be a file stored in S3.

Next, we start a transcription job:

transcriber.startTranscriptionJob(params, (err,result) => {
    if(err) throw err;
    console.log(result);
});

Perfect.

Listing Your Transcription Jobs

This is essentially the “ls” command of Amazon Transcribe. The documentation is here.

For this tutorial, we’ll set params to an empty object – {}. You can further narrow down your ls command using params in the future.

Here’s the code:

const AWS = require('aws-sdk')
AWS.config.update({region:'us-east-1'})
const transcriber = new AWS.TranscribeService()

var params = {};
transcriber.listTranscriptionJobs(params, function(err, data) {
    if (err) console.log(err, err.stack); // an error occurred
    else console.log(data); // successful response
});

Pretty simple. Pass in the empty object, and your previous jobs are returned as the result:

{ TranscriptionJobSummaries:
[ { TranscriptionJobName: '5MinuteTest',
    CreationTime: 2018-10-30T10:50:37.736Z,
    LanguageCode: 'en-US',
    TranscriptionJobStatus: 'IN_PROGRESS',
    OutputLocationType: 'SERVICE_BUCKET' } 
] }

Once the transcription is complete, the TranscriptionJobStatus field will change from “IN_PROGRESS” to “COMPLETE”.

Once the status switches to “COMPLETE”, you can access it using getTranscriptionJob.

Getting a Completed Transcription

Once a transcription job is complete, you can fetch it from the server using getTranscriptionJobs command.

The API and code format is similar to everything we’ve used so far:

const AWS = require('aws-sdk')
AWS.config.update({region:'us-east-1'})
const transcriber = new AWS.TranscribeService()

var params = {
    TranscriptionJobName: '5MinuteTest'
};
transcriber.getTranscriptionJob(params, function(err, data) {
    if (err) console.log(err, err.stack); // an error occurred
    else console.log(data); // successful response
});

Amazon will return a JSON object, containing a URL in the “TranscriptFileUri” field which contains your transcription:

{ TranscriptionJob:
   { TranscriptionJobName: '5MinuteTest',
     TranscriptionJobStatus: 'COMPLETED',
     LanguageCode: 'en-US',
     MediaSampleRateHertz: 16000,
     MediaFormat: 'wav',
     Media:
      { MediaFileUri: 'https://s3.amazonaws.com/speechtotext91234/5mins.wav' },
     Transcript:
      { TranscriptFileUri: 'https://s3.amazonaws.com/aws-transcribe-us-east-1-prod/944367763317/5MinuteTest/5e2fc1f4-06b6-4e77-a087-5a29aa7e6722/asrOutput.json?X-Amz-Security-Token=FQoGZXIvYXdzEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDBPTyp4ZYiM%2FdrNe4SK3A9A6JhIeKTRwvA%2BJKFe%2FEjSs7j578g39bJR5HJVsB1RqBhcuh1MZCRZhvaTpezVFOnZndUUFk3hgBo%2F9IiLJDpvAkGE6Hho0dGLE8yvkx44BR%2Bv8DJ%2BPWIiVItH1u16Fsa25Yse13jKAGnuizO7IbEt9hmhNV%2BNLjiia3obT0Asl1t1YSIdDxm2Dac%2FqrMq4sTUSlz5grl9uFqkJ74fewIMmhAexXP1xXnHjORHLjiZvb4%2Faa1y2kZ5%2F1Ne9RSXhgEm7v%2F8J0PmOsB9PBNz5664mgNC%2FjfF6CLXzLlwHJ1sRVho2X%2BZPBEL1NSGKzzcKmgYPEreWToMzkfcfukpy3sRRshdwqV7EghZeoukUgvD%2FLbr%2BkSvJULfHZ9peJ6i5rINgKNgtRHa8aDdoj2rrBb1AixqyaAGAgr57Vnfc1pQDoquM9tEHXbmJhjrTymJnhmxRrj5r42MspbHrFSIhUK1heUKBt68dc3tYAYhz9hzd5ZtK3zZfa7%2B%2BdS4VXTm4x%2BICQQif6ydNB8fqv0EDM8yzlsjDhw0CS3fHXbDBMRrW68WlsLXSLrWMqehW%2Fxv8lulkHdlATqAo%2Bs3g3gU%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20181030T105908Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIAUA2QCFAATKXDPRNN%2F20181030%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=9573ff793fd8736496cb3718ce607dbf29a57dfbc9437a05532ebf2ca527b51e' },
     CreationTime: 2018-10-30T10:50:37.736Z,
     CompletionTime: 2018-10-30T10:55:20.407Z,
     Settings: { ChannelIdentification: false } } }

Questions or Comments?

That’s about it – that’s how you can start new transcription jobs, list existing jobs, and fetch final jobs.

Thoughts? Comments? Questions? Post in the comments section below!