Reshaping Speech to Text with VIQ Solutions

Modern Courthouse Interior
Learn how VIQ is reshaping the landscape of speech-to-text with cutting-edge technology, addressing resource shortages in courtrooms.

Speech Recognition Technology

Speech recognition technology has existed for many years, and converting speech to text is now commonplace. In this context, a speech engine refers to software that processes and converts spoken language into written text. Multiple speech engines are available from various vendors, making them easier for any size company to implement.

Are there limitations to using speech recognition?

One of the greatest limitations affecting speech to text output is the quality of the audio provided. If the audio is not clear, loud enough, or if people are speaking over each other, that will affect the speech to text and therefore the usability of the output. Each engine has its own unique characteristics, such as allowing detecting multiple speakers (also known as “diarisation”), including custom vocabularies, omitting or including disfluencies (“uhh’s and uhmm’s”), etc. There are also engines that do better in certain audio environments (such as telephone calls vs. in-person recordings).

VIQ Solutions patented software, aiAssist, can deal with all of these variables (and more!) by fine tuning speaker diarisation, applying post-processing rules to immediately fix errors in proper names and unique words to the customer, optimising our language models for specific industries and regions, and using domain specific language models (DSLMs) to tune the engine to the specific style characteristics  and needs of the customer. 

How does Speech-to-text address resource shortages?

The pool of trained court stenographers or reporters is decreasing as it is no longer a widely sought-after profession. At the same time, speech-to-text technology has become more affordable and widespread. This resource shortage, particularly in smaller rural areas, can be alleviated by cost-effective speech recognition technology, which delivers accurate speech-to-text output in a fraction of the time. The remaining court resources can then edit the speech-to-text draft in a timely manner.

Using AI-based speech technology from VIQ Solutions, a person utilising NetScribe for editing daily for approximately six weeks can expect around a 30% improvement in their productivity. Furthermore, as verbatim edits are made to speech recognition drafts, our DSLM’s and post-processing capabilities continuously improve and provide further gains in word accuracy and formatting nuances.

Read more about How a U.S. Judicial District Eliminated Strain on LImited Court Reporting Resources >

Cloud enables speech-to-text flexibility for courts.

Using cloud infrastructure is a significant advancement in the speech recognition industry. In many rural areas, court stenographers or reporters own the court documentation they create. With the current resource constraints, the need for a court to own and securely store its documents in the cloud is more crucial than ever. VIQ’s NetScribe technology and government cloud infrastructure provide complete control over documentation and resources in a highly secure environment, ensuring a court or deposition company peace of mind.

Producing a quality document

 At the end of the day, end users need the most usable document to gain real benefit from speech to text.  There are several factors that go into document usability including the Word Error Rate (WER), the Diarisation Error Rate (DER), and formatting. VIQ’s NetScribe technology works with aiAssist to address all these nuances to give the end user the most usable document through fine-tuned diarisation, automated formatting, speech to text post-processing, and domain specific language models (DSLMs).

What is a hybrid transcription model?

A hybrid transcription model is when the end user requests a draft transcript with the ability to further request a professionally edited transcript by a human. Utilising VIQ’s FirstDraft, a user can quickly receive a highly usable, formatted document without human intervention. This is designed specifically for end users who do not require a verbatim transcript but want a highly usable document while waiting for the professionally edited verbatim transcript.

How VIQ helps resource shortages with speech-to-text

VIQ Solutions has been working on continual technology improvements to speech-to-text, formatting, infrastructure, and technology, making a significant difference for companies in different verticals facing resource shortages. The constant feedback we have received from our customers validates that we are making a meaningful impact on solving real-world problems, such as reducing transcription costs, improving document accuracy, and increasing productivity. VIQ will continue to produce features that help with this critical resource shortage issue.

VIQ Solutions’ speech recognition technology is designed to be speech engine agnostic. We work with multiple industry-leading speech engine vendors, so as speech recognition technology evolves or output accuracy changes, we can quickly adapt to offer our customers optimal speech recognition results.

To learn more, visit or contact