Site icon TheWindowsUpdate.com

Giving Healthcare a Voice With AI Driven Text To Speech

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

  Virtual assistants like Siri, Cortana, or Alexa have changed the way we use interact with computers.   Imagine the transformational impact of ubiquitous, high quality AI generated voices.   No need to imagine!  Let's actually do it.  We'll get started by transforming one of the most tedious aspects of  healthcare- the presentation with text-to-speech.  Let's let the AI do the talking for us!

 

The Result!

 

 

Giving Voice to HealthCare

Anywhere where systems need to communicate data to busy people who need to have their hands free are excellent candidates for text to speech.    

For example:

But with text to speech we open many options beyond just reading data.  Messages could be tailored based on audience, personal perfernce, situation, or other characteristics.

 

We're at the start of our AI assisted world and we can't wait to see what you come up with.

 

 

Let's get into the details

What is TTS?

TTS stands for Text-To-Speech. We use TTS all the time. Whenever we hear Siri, Google Assistant, or GPS directions we're hearing the output of TTS.

TTS is all around us because it's super easy for computers to output text (see: "Hello World") and people typically find voice easy to use.

TTS isn't a new technology. Wikipedia's article on speech synthesis lists 1975 as the year where commercial text to speech was first available.

But if you have an early GPS with TTS, or you listen to phone prompts ("You entered 0-2-3 is that right?") you'd sometimes hear an obviously generated voice. These TTS voices were created by having an actor record many syllables and then stitching those back together. But this meant that pauses, pitch and speed changes that come naturally to all of us were not included.

Enter the next generation of TTS with Azure TTS.

Azure Text to Speech

Azure Text to Speech is part of the next generation text to speech services that uses deep nueral networks to produce sound. The advantage of this process is the ability to generate voices from fewer samples and simulate the changes in pitch and speed that make up acents.

Code

All code is available on GitHub

Exit mobile version