This post has been republished via RSS; it originally appeared at: IIS Support Blog articles.
From a support perspective, having a high-level view on a bot solution helps a lot figuring out the possible causes for issues, or at least isolate better where they happen.
In fact, we’re talking at least 3 sides, 3 applications that are “talking” via web requests.
- The bot client: the application or mean for a user to engage in a bot conversation;
- The bot logic endpoint, the code executed to fulfill the bot’s functionality;
- The bot connector: a proxy or hub conducting messages from client to “bot logic” and then replies back from bot to the client.
The bot solution may offer users various conversational channels (clients) to engage the bot, such as Facebook messenger or SMS messages. A custom client may be created too with the DirectLine channel, using the DirectLine Client SDK Libraries.
The bot logic is where the developer will deploy the custom bot code, which is the actual functionality. It may be hosted either as an Azure App Service, or as any other on-premises Web application, as long as its endpoint is publicly reachable by calls from the “Connector”.
The “Connector” service is a global application owned and operated by Microsoft, hosted on Azure nodes spread geographically to ensure scalability and fast response. It will “talk” with the client and the logic, providing proxy services in a conversational way. While relaying messages, it will also transform their “envelopes”: from a channel-specific message format to a consistent format used by the bot framework.
Communication between the connector service and the bot logic endpoint is ensured via HTTP POST requests. Each message from user or reply message from bot will become a POST request between connector and bot.
Security. Creating bot
The HTTP requests between connector and logic are secured, authenticated; they bear a specific HTTP header holding a token issued by Azure Active Directory. Somehow, Azure AD must be aware of both connector and the bot logic, because both of them “outsource” authentication to Azure AD.
When creating a bot in the Azure portal, we may choose from a couple of provisioning templates:
- Bot Channel Registration: Assumes that the logic, the bot API app, will be hosted on premises. So, this template, to put it simple, will register a new bot with the connector service, adding an application ID entry into the Azure AD tenant that the connector trusts.
- Web App Bot: It is a bot channel registration plus a few things, considering that the bot logic will be hosted in Azure, among which:
- Provisions an Azure App Service where the bot logic will be executed;
- Creates (or assumes) an entry in the Azure AD tenant of the subscription where the bot connector service is provisioned;
- Deploys some bot template code in the Azure App Service created, after populating config files; just a starter for the bot logic to be further developed.
- Functions Bot: Very similar to the Web App Bot template, except that the logic will be hosted in a serverless Azure Functions service.
It is important, when creating bots, to choose Application Insights or at least some other monitoring solution; I’ll detail why shortly. Also, enabling Application Insights needs an extra step; see below in the Monitoring section.
The Gateway Timeout issue
The connector, acting as a proxy between client and logic, has some expectations. When these are not met, issues are recorded.
One of the expectations is that the logic will acknowledge a user message in no more than 15 seconds, a threshold hardwired in the connector that the developer cannot configure.
The connector will relay the user message as a POST request, expecting an HTTP response from the logic with a 202(Accepted) status code.
However, before the logic can acknowledge the user message with the “202” response, it will usually try to reply to the user, via the connector, with its own POST message to the connector. In many cases, such a reply will involve calls made to dependency services:
- Azure Active Directory:
Since replies from the bot logic are authenticated POST requests, the authentication header must contain a token from the Azure AD.
- Bot state, conversation context:
When responding to a message, the bot logic usually has to correlate that message with a conversation/user context, maybe reply after consulting some state information.
- Other cognitive services:
The bot logic may rely on a QnA/FAQ repository, language understanding (LUIS), image or voice recognition, etc.
- Bot-specific resources:
Sometimes a reply involves querying some database or performing transactions in other systems.
All these dependencies will add to latency. But there is more. As a web application, the bot logic may be “asleep” due to idleness (no requests received in a while), or it may restart due to configuration changes. With Asp.Net Core or NodeJS, a separate process from IIS will actually execute the app; yet more latency.
The 15-seconds threshold may be easily exceeded, so all these aspects must be considered when thinking of performance. I’d recommend that the bot replies ASAP to the user, even if the reply is simply something like “I got you, allow me some time”; then the full answer may be provided. Such a fast reply may also be a simple activity update sent by bot to the connector.
In the end, lots of things may have to happen before the bot can reply to the user. So, there are lots of points where things may go wrong. How does one trace an error or a latency, when the bot logic may be a “black-box”? How can we “see” inside the execution?
While the web server logs or application logs may offer some information, the troubleshooting or performance tuning at times need to relay on more details. We need a monitoring solution. Enter Application Insights.
When creating a bot in the Azure portal, the wizard does offer the option to employ Application Insights as monitoring solution, and it even places the needed configuration info for the bot logic. But there is a catch: the bot logic, the app, does not log into Application Insights by default; an extra step is need.
The bot logic must include the right NuGet or NPM packages; and these will report on exceptions, for instance, by default – no extra code needs be added by the developer. Other telemetry info that Application Insights can track will need extra code, if you want to monitor custom events or metrics.
Do enable Application Insights!
I found there is an easier way to enable it from the portal, for the App Service or Azure Functions instance, illustrated below:
Best of luck!