Next-Gen Customer Service: Azure’s AI-Powered Speech, Translation and Summarization

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

The purpose of this solution is to enable applications to incorporate AI capabilities. In the upcoming demo, I will showcase how to transcribe and speech (Azure AI Speech), translate (Azure AI Translator) and summarize (Azure AI Language) conversations between customers and businesses without significantly modifying your existing apps.


This application can be helpful in various scenarios where two parties speak different languages and require simultaneous translation. For instance, it can be employed in call centers where the representative and the customer do not speak the same language, by bank tellers dealing with foreign clients, by doctors communicating with elderly patients who do not speak the native language well, and in other similar situations where both parties need to converse in their respective native languages.


We will use the React.js client-side SDK and REST APIs of the Azure AI Services in this solution. The application's backend is a slim Next.js Node.js server that uses Azure Web PubSub for Socket.IO
to provide real-time, duplex communication between the client and the server. Furthermore, the Next.js slim backend is hosted using Azure Container Apps.
Scenario explained
> Note: This solution focus on the client-side of Azure AI Services. In order to keep the solution simple we will create the surrounding resources manually using the Azure portal.

Active Azure subscription. If you don't have an Azure subscription, you can create one for free

Create a Speech resource in the Azure portal.

Create a Translator resource in the Azure portal.

Create a Language resource in the Azure portal

Create a Container Registry in the Azure portal.

Create Azure Web PubSub for Socket.IO resource using 

Prepare the environment
Clone the repository
git clone


Set the Next.js environment variables (.env)
> Note: The Next.js local server requires loading environment variables from the .env file. Use the env.sample template to create a new .env file in the project root and replace the placeholders with the actual values.
> Important: It is essential to keep in mind that the .env file should NEVER be committed to the repository.
The .env file should look like this:
Set the environment variables for the script (
> Note: To set up the Azure Container App resource running the Next.js server, you need to set the environment variables for the script. The script uses the file to load these variables, which should be created at the project's root. You can use the template to create a new file in the project root. Replace the placeholders in the template with the actual values.
> Important: The file must NOT be committed to the repository.
> Important: The file depends on the .env file we created in the previous step.
The file should look like this:
Create a service principal (App registration) and save it as a GitHub secret
> Note: The GitHub action employs the app registration service principal to handle two roles. First, it pushes the images to the Azure Container Registry (ACR). Second, it deploys Azure Container Apps. To perform these roles, the service principal must have a Contributor role on the Azure resource group.
login to azure
az login
 create app registration service principal:
az ad sp create-for-rbac --name aiservices-github --role Contributor --scopes /subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RESOURCE_GROUP} --sdk-auth
The command above will result a JSON formatted output that looks like the JSON block below,
Copy the output and save it in a GitHub secret named AZURE_CREDENTIALS.
  "clientId": "00000000-0000-0000-0000-000000000000",
  "clientSecret": "00000000000000000000000000000000",
  "subscriptionId": "00000000-0000-0000-0000-000000000000",
  "tenantId": "00000000-0000-0000-0000-000000000000",
  "activeDirectoryEndpointUrl": "",
  "resourceManagerEndpointUrl": "",
  "activeDirectoryGraphResourceId": "",
  "sqlManagementEndpointUrl": "",
  "galleryEndpointUrl": "",
  "managementEndpointUrl": ""
Set other environment variables for the GitHub action
At the GItHub repo setting, move to the Variables tab.
  1. Set the CONTAINER_REGISTRY variable to the name of the Azure Container Registry.
  2. Set the RESOURCE_GROUP variable for the resource group where you Container registry.
  3. Set the CONTAINER_APP_NAME variable to the name of the Azure Container App.
Run it locally
After creating the Azure resources and setting up the environment, you can run the Next.js app locally.
npm install
npm run dev​

This command will start the Next.js server on port 3000. The server will also serve the client-side static files.
After the app is running, you can access it locally at http://localhost:3000.
Run it as an Azure Container App
Run the script to create the Azure Container App resource to run the Next.js server on Azure.
az login

Wait for the app to be deployed, the result will be the FQDN of the Azure Container App.
> Note: You can get the FQDN by running the following command:
az containerapp show \
  --resource-group $RESOURCE_GROUP \
  --query properties.configuration.ingress.fqdn 

> Note: You can also get the Application Url on the Container App resource overview blade.

We have completed the setup part.
Now, you can access the app using the Azure Container App URL.
Sample Application - how to use it

Spoken language: Select the language of the speaker.

Translated language: Select the language to which the spoken language will be translated.


Listen: Start the speech-to-text transcription, this will use the Speech to text SDK for JavaScript package.

> Reference code: /src/stt.js:
// stt.js
// intialize the speech recognizer
const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.token, tokenObj.region);
const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);
// register the event handlers
// listen and transcribe
Stop: Stop the speech-to-text transcription.
> Reference code: /src/stt.js:
// stt.js
// stop the speech recognizer
Translate: On the listener side each recognized phrase will be translated to the selected language using the Text Translation REST API.
> Reference code: /src/utils.js:
// utils.js
const res = await`${config.translateEndpoint}translate?api-version=3.0&from=${from}&to=${to}`, data, headers);
Summarize: Summarize the conversation. for this we will use the Azure Language Service Conversation Summarization API.
> Reference code: /src/utils.js:
let res = await`${config.languageEndpoint}language/analyze-conversations/jobs?api-version=2023-11-15-preview`, data, headers);
const jobId = res.headers['operation-location'];

let completed = false
while (!completed) {
res = await axios.get(`${jobId}`, headers);
completed = > 0;

const conv =[0].results.conversations[0] => {
return { aspect: summary.aspect, text: summary.text }
return conv;
Speak: The translated text will be synthesized to speech using the Text to Speech JavaScript package.
> Reference code: /src/stt.js:

// stt.js
// intialize the speech synthesizer
speechConfig.speechSynthesisVoiceName = speakLanguage;
const synthAudioConfig = speechsdk.AudioConfig.fromDefaultSpeakerOutput();
synthesizer = new speechsdk.SpeechSynthesizer(speechConfig, synthAudioConfig);
// speak the text
function (result) {
if (result.reason === speechsdk.ResultReason.SynthesizingAudioCompleted) {
console.log("synthesis finished.");
} else {
console.error("Speech synthesis canceled, " + result.errorDetails +
"\nDid you set the speech resource key and region values?");
Clear: Clear the conversation history.
> Reference code: /src/socket.js:
// socket.js
clearMessages = () =>
Sync: Sync the conversation history between the two parties.
> Reference code: /src/socket.js:
// socket.js
syncMessages = () =>
Resources Deployed in this solution (Azure)


  • Container Registry: for the Next.js app container image.
  • Container App (& Container App Environment): for the Next.js app.
  • Language Service: for the conversation summarization.
  • Log Analytics Workspace: for the logs of the container app.
  • Web PubSub for Socket.IO: for the real-time, duplex communication between the client and the server.
  • Speech service: for the speech-to-text transcription capabilities.
  • Translator service: for the translation capabilities.
Improve recognition accuracy with custom speech

How does it work?
With custom speech, you can upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint.
Here's more information about the sequence of steps shown in the previous diagram:
  1. Create a project and choose a model. Use a Speech resource that you create in the Azure portal. If you train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. For more information, see footnotes in the regions table.
  2. Upload test data. Upload test data to evaluate the speech to text offering for your applications, tools, and products.
  3. Test recognition quality. Use the Speech Studio to play back uploaded audio and inspect the speech recognition quality of your test data.
  4. Test model quantitatively. Evaluate and improve the accuracy of the speech to text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if more training is required.
  5. Train a model. Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
  6. Deploy a model. Once you're satisfied with the test results, deploy the model to a custom endpoint. Except for batch transcription, you must deploy a custom endpoint to use a custom speech model.
We demonstrated how to use Azure AI Services to enable apps to incorporate AI capabilities for transcribing, translating, summarizing, and speaking conversations between customers and businesses with minimum effort on existing apps and focusing on the client side capabilities. These features can be helpful in scenarios where two parties speak different languages, such as call centers, banks, and medical clinics.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.