6 days ago 59

Our Verdict

A slick method to work with audio that utilises transcription in an innovative way. It is affordable, does a reasonable job of transcribing, but it sadly only works with English. Podcast producers might love this solution.


Easy to use Free trial Zapier integration Affordable


Only English transcription

Some transcription tools focus only on converting live or pre-recorded audio into useable text, with little consideration about what that content might be used for subsequently.

Descript is different, in that it is more ‘production’ aligned, and has been built with a series of typical scenarios in mind.

These include the ones you might reasonably expect; transcribing a podcast, a zoom meeting or a live event. But this tool can also be used as a mechanism to edit sound, audio overdubbing and managing large amounts of content for researchers to sift and analyse.

In this respect, with the audio being elevated in its importance, Descript might be the right tool for specific niches users, in addition to the core customers who are creating podcasts, YouTube content and similar.

But is Descript different enough in an increasingly competitive sector?

Want to try Descript? Check out the website here

Podcast transcription with Descript

(Image credit: Descript)

Plans and pricing

Descript offers three plans; Free, Producer and Team. As free suggests that option has no cost, but it is a trial that provides just three hours of transcription and not all the features that Producer and Team includes.

For that pay annually, Producer costs $10 per month ($14 for a monthly subscription), and Team is $15 per month ($18 monthly). These both come with 10 hours of transcription per month, and if you need more, then that capacity can be boosted by $2 per hour required.

That’s affordable, and Descript offers reduced rates for Enterprise customers, students, educators, and non-profit organisations.

The Team plan is designed to allow multiple users to operate with centralised billing and shared projects, but in other respects, is the same solution as experienced by Producer customers.

Podcast transcription with Descript

(Image credit: Descript)


Descript offers both an online web-based solution and a desktop application, whichever best suits your needs. Each provides very similar functions, and both need the computer system to be connected to the internet for audio to be uploaded and processed by the Cloud-based service.

Where Descript is slightly different from other tools is that it organises audio content in collective projects rather than making each file a project. This approach is highly suitable for a workflow that holds multiple soundtracks that are then collated for a single project.

This organisation is less important for those using this tool in isolation, but for teams working with many audio recordings or video clips, assembling a documentary, for example, it is ideal.

Podcast transcription with Descript

(Image credit: Descript)

But where Descript truly goes away from the norms of transcription is that it is designed as a word processor for sound, in that transcripted text changes flow back into an audio file when words or entire sentences are removed.

This feature is perfect for Podcast creators, as it enables them to quickly remove unwanted parts of a recording quickly and seamlessly.

With this approach, a Podcast can be rapidly distilled down to the parts that people will genuinely want to hear and remove all the silences and other audio distractions.

And, multiple audio tracks can be layered to provide introductory music, sound effects or transitions as required.

The logical progression of this is a technology that Descript has termed ‘Overdub’ that we’ll talk about later. But you can also overdub in a conventional sense if you find that something in a Podcast needs clarification, by re-recording the words and inserting that into the original audio.

Podcast transcription with Descript

(Image credit: Descript)

Working with recordings

The Descript editor works with audio and video files in an elegant fashion and bears more than a passing resemblance to the Google Doc applications.

We’ve seen many that place the text in a simple word-processor at the top and the audio waveform at the bottom.

Descript extends this concept by placing the spoken words into the audio wave ribbon, allowing you to quickly move to the part of the recording that you wish to work succinctly.

The editor has two modes depending on what you are trying to do; Edit media and Correct Text. Edit media is for altering the audio by removing words, word gaps, filler words or splitting the audio. At any time, you can record some extra sound, or insert it from a media library, as required.

If you are more interested in the text, the Correct Text mode allows the transcription to be edited to fix inaccuracies or to add the name of speakers.

What is fascinating is that if you delete a word or phrase in this mode, the changes are immediately reflected in the audio track. One limitation of this is that you can only remove 40 characters in a single delete, so those wanting to ditch larger parts should consider some judicial audio editing before processing the file in Descript.

Overall, the editor is the highlight of this solution and lacks only a few minor features we've seen elsewhere.

Podcast transcription with Descript

(Image credit: Descript)


In our testing, this wasn’t the most accurate transcription we’ve seen, but it was also far from the worst.

Descript uses the Google Cloud Speech-to-Text technology, and therefore the level of accuracy is much the same as you might expect from Google Home or Text-to-speech functionality within Google Docs. But it can also use Rev, another provider, and that might offer a better quality transcription solution.

You can’t tell which service is being used on the documents you’ve chosen to transcribe so that point might be an extraneous detail.

Where these solutions normally fail is when presented with names or words that it is unfamiliar, and they guess, sometimes wrong. What’s lacking is a means to add a specific library for a project or more generally where words identified incorrectly can be tagged so they can be corrected in the future, or even later in the same recording.

Probably the biggest limitation of Descript is that it can only work with English, and therefore attempts to convert the audio from those who aren’t native English speakers will suffer.

Podcast transcription with Descript

(Image credit: Descript)


Currently, Overdub is provided as Beta for those that request it, but at the time of writing, we weren’t given access to it.

But what we can tell you about it is that this takes the AI processing of the recording and from that extracts a model of speech that can then be used to form words and sentences that weren’t spoken. According to those that have experimented with it, the results are impressive, as the simulated words have the same tone and intonation as the rest of the recording.

Obviously, how well this might work will be dependent on the amount of existing audio that the AI has the opportunity to hear and the quality of those recordings. And, it goes almost without saying that how this might be used by those without ethical restraint might be concerning to many.

At this time, Overdub only allows you to sample your own voice for this purpose, and it has published an ethical statement outlining its stance on ‘generative media’ as it refers to it.

Like it or not, these abilities are going to become commonplace soon, and for those that need voice-overs for productions without the need for multiple recording sessions, resulting in a significant reduction in production costs.


The security of things processed through Descript is complicated by the way that this solution uses several different outsourced technologies to achieve its objectives.

Therefore, Auth0 is used for identity verification, Zendesk for customer support, Stripe for processing payments, Amazon for storage, Rev for transcription, and Mailchimp for emails.

With so many companies involved the overall security issue is dispersed, and only as good as the weakest part of the whole.

The better news is that Descript promise that the content that you transcribe is only used for providing the service and is not data mined for sales and marketing information for external businesses. Some information is shared with Google, Amazon, and Stripe, but what is shared is detailed on the Descript website.

And, all data moving around the system is encrypted over HTTPS, and it is encrypted when at rest on the servers.

Our only complaint is that the password protection offered by the account management doesn’t offer two-factor authentication, and as it contains payment information, that would be preferable.

Podcast transcription with Descript

(Image credit: Descript)

Final verdict

We liked Descript, as it is clear to understand how it is mean to be used by those working with Podcasts and similar content. For those that need to edit recordings while also creating a transcript, it covers both requirements in one.

What it lacks is any ability to handle anything other than English, and it also doesn’t have a means to auto-correct inaccuracies by learning.

If you work exclusively in English, then this could be a highly affordable solution that delivers a refined product with good quality transcription attached.

It isn’t ideal for transcribing more generally, as there are solutions available that are more accurate even if they take longer to process.

But for those working with Podcast audio exclusively in English, Descript could be the solution they’re looking for.

Read Entire Article