Testing IM and Web-Conferencing Archiving set to Critical

This post has been republished via RSS; it originally appeared at: Skype for Business Blog articles.

First published on TECHNET on Apr 01, 2017
Organizations often chose to enable IM Archiving for multiple reasons, while some may be for record keeping purposes, others may have a regulatory /compliance requirement to ensure IM Archiving is occurring for every IM and Web-Conferencing Session.

When an organization is archiving for regulatory /compliance purposes, it may be possible that they are required to stop the service, if Archiving is failing. In Lync Server 2013 and Skype for Business Server 2015, we offer this by means of a setting in the commandlet Set-CSArchivingConfiguration called BlockOnArchiveFailure


Parameter Required Type Description
BlockOnArchiveFailure Optional System.Boolean If True, then the IM service will be suspended any time instant messages cannot be archived. If set to False (the default value), IM will continue even if instant messages cannot be archived.




This can also be accessed from the Control Panel and would look similar-to the image below.



Just as organizations perform Disaster Recovery Exercises / Routines, to validate that their infrastructure works as intended, and the organization ( or Organizational unit) is prepared with up to date documentation, if, a Disaster event occurs, organizations may also want to test and/or prove that IM and Web-Conferences would fail, if archiving was to fail.

With Lync Server 2013 and Skype for Business Server 2015, proving that IM and Web-Conferencing would stop, if Archiving was to fail can be a little challenge. Here’s why

Challenge#1
If the Archiving Database is Offline, Lync will export storage data to Web-Service File-share (fo r example \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\ )

Challenge#2
If the Archiving Database is offline, and the Web-Services File-share has not access we would see EVENT ID 32080 and the System would fail-back to C:\ProgramData\Microsoft\Skype for Business Server 2015\StorageService

Challenge #3
If the Archiving Database is offline, and the Web-Services File-share has not access we would see EVENT ID 32080 and access to the path C:\ProgramData\Microsoft\Skype for Business Server 2015\StorageService is also blocked. The local Database can have 5,000 Items or upto 10 GB ( SQL Express Limitation)

The challenges mentioned above, can certainly make it ceretainly operationally challenging to undo. There can be a lot of delay in undoing the efforts, which can cause of productivity.

Solution #1
Stop LyncLocal Instance on all Lync Front-End Server in the pool, where we want to simulate a failure. Since this is rather common solution, people might want to introduce another solution.

Solution #2
Set the LySS Database offline in SQL, so all access from a communications server is blocked. This can be accomplished by running the following query on each Front-End Server

ALTER DATABASE LySS SET OFFLINE WITH ROLLBACK IMMEDIATE

As soon as this is completed on an Enterprise Edition Pool or a Standard Edition pool, IM messages will stop transmitting from the pool. Presence will still be available, but both IM and Web-Conferencing would be failing.

In-order to bring services back to business as usual, one will have to bring the database online by running

ALTER DATABASE LySS SET ONLINE

Once the databases in your routing group is online, you will be able use IM and Web-Conferencing again.



Here are some-event logs, which may be useful during testing. I am adding them so the web-page is indexed, and administrators come to an authoritative source, when they search for EVENT ID's or Descriptions.

EVENT ID Source Event ID Description
56717 LS Data Collection IM was blocked in critical archiving mode due to local Storage Service is full or unavailable.
Cause: Storage Service or its dependent components are not running.
Resolution:
Ensure the local Storage Service database is not full and target storages such as SQL Server or Exchange Server are available.
56800 LS Data Collection Failed to commit session data into the local Storage Service database .
Error:
SessionUpdateException: code=Success, r eason=, Unable to finalize session, no session items removed, no new items enqueued at Microsoft.Rtc.Internal.Storage.Queue.LyssQueueDal.FinalizeSession(StoreContext ctx, Guid adapterID, HashSet`1 sessionIDs, List`1 queueItemList)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.WrapperFinalizeSession(StoreContext ctx, LyssQueueDal dal, HashSet`1 sessionIds, List`1 queueItems)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.FinalizeSession(StoreContext ctx, LyssQueueDal dal, HashSet`1 sessionIds, List`1 persistItems)at Microsoft.Rtc.Server.UdcAdapters.UcSessionAdapter.PersistSession(StoreContext ctx, LyssQueueDal dal, SessionState entry, Boolean isCriticalMode)Cause: Storage Service or its dependent components are not running.Resolution:Ensure the local Storage Service database is not full and target storages such as SQL Server or Exchange Server are available.

32042 LS Storage Service Storage Service API failed to add a message to the queue.

Add Queue Message failure. EnqueueException: code=ErrorQueueUnhealthy, reason=Unable to Enqueue Message: Storage Queue is not healthy due to errors: Storage Service Database is full.

. Please retry later.

at Microsoft.Rtc.Internal.Storage.Api.StorageService.BeginEnqueueMessages(EnqueueMessagesRequest enqueueMessagesRequest, AsyncCallback asyncCallback, Object state)

Cause: Authentication or authorization failure, bad input parameters, fabric errors, timeouts, other errors.

Resolution:

Check event details. Ensure that the caller of Storage Service is properly authenticated using windows authentication, and has the required authorization based on security group membership. Verify that inputs were valid. If problem persists, notify your organization's support team with the event detail.
32008 LS Storage Service Unexpected exception.

Message=Error: Path \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\ failed to be read for flushed data. Error details: System.IO.IOException: The network path was not found .

at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)

at System.IO.FileSystemEnumerableIterator`1.CommonInit()

at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)

at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption)

at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.CheckFilePathForFlushedFiles(StoreContext ctx, String parentFilePath, Boolean checkArchived, Boolean& errorOccurred, Int32& numDataFilesToReport)

Exception: The network path was not found.

Stack Trace: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)

at System.IO.FileSystemEnumerableIterator`1.CommonInit()

at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)

at System.IO.Directory.GetFiles(String path, String searchPattern, SearchOption searchOption)

at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.CheckFilePathForFlushedFiles(StoreContext ctx, String parentFilePath, Boolean checkArchived, Boolean& errorOccurred, Int32& numDataFilesToReport)

Cause: Unexpected exception.

Resolution:

If problem persists, notify your organization's support team with the event detail.
32013 LS Storage Service Cannot perform a LYSS database operation.

Message=#CTX#{ctx:{traceId:18446744072925107599, activityId:"c0af1230-6791-473f-a13a-76795835de80"}}#CTX# FinalizeSession sproc failed: SprocNativeError = [1105]

Exception: System.Data.SqlClient.SqlException (0x80131904): Could not allocate space for object 'dbo.ItemQueue'.'CL_ItemQueue' in database 'lyss' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()

at System.Data.SqlClient.SqlDataReader.get_MetaData()

at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds, Boolean describeParameterEncryptionRequest)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader()

at Microsoft.Rtc.Common.Data.DBCore.Execute(SprocContext sprocContext, SqlConnection sqlConnection, SqlTransaction sqlTransaction)

ClientConnectionId:8d59a7be-4c40-4747-9d00-33b889057e0c

Error Number:1105,State:2,Class:17

Stack Trace: at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()

at System.Data.SqlClient.SqlDataReader.get_MetaData()

at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds, Boolean describeParameterEncryptionRequest)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader()

at Microsoft.Rtc.Common.Data.DBCore.Execute(SprocContext sprocContext, SqlConnection sqlConnection, SqlTransaction sqlTransaction)

Cause: Cannot perform an LYSS database operation.

Resolution:

Verify that the data is valid and that the LYSS database is available and healthy. If this error caused by Violation of UNIQUE KEY constraint 'CL_ItemQueue', then most likely it is due to at attempt to load a duplicate item from the file share. If so, find flushed xml file that contains duplicated key and move the xml file to somewhere else. In addition, please verify the file share is healthy.
32059 LS Storage Service Space Used by Storage Service DB is at or above the Critical Threshold.

SQL Edition=Express Edition (64-bit); Space Used Percent=87.5; Critical Threshold Percent=80 queue item counts summary:

owned: True, status: 2, critical: True, count: 356

Total queue items: 356, total archived items: 0

Cause: The DB size can grow bigger under heavier usage as the data in the Storage Service Queue and/or Cache grows. Once Storage Service finishes processing the data, the db will shrink back to normal size. However breaching the critical threshold implies that the normal processing of the data is slow or blocked resulting in so much excessive DB growth that service functionality is now affected and blocked.

Resolution:

Check event details to find the root cause of why data is not getting processed. Resolve the root cause to allow Storage Service to start shrinking the DB down naturally. If problem persists, notify your organization's support team with the event details.

32089 LS Storage Service A flush of queue items from the Storage Service DB was initiated, and items were exported to the file system.

Queue size: Error, flushed 1 files to the filesystem. success: True.

Files: \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\DataArchive\20161122\LyncStd01.contoso.com\ e1dc38d13ed15269b601a5460e8f9631__1.xml

Cause: Periodically, or in reaction to the size of the Storage Service queue, we may purge items from the database, exporting them to the file system in order to ensure performance isn't impacted due to the accumulation of data. These items should be re-imported after the root cause of the accumulation is resolved. Typically this would occur due to an outage of a data storage endpoint (like Exchange), or could be due to a sustained period of high system load.

Resolution:

The resource kit tool is available to import exported items back into the DB for processing.

32090 LS Storage Service Flushed queue Items from the Storage Service DB have been left unattended to for some amount of time and require attention to be imported back.

Parent Path \\contoso.com\LyncFileShare\1-WebServices-1\StorageService\ . 112 data files are over 5 days old.

Cause: Periodically, or in reaction to the size of the Storage Service Queue, we may purge items from the database, exporting them to the file system in order to ensure performance isn’t impacted due to the accumulation of data. These items should be re-imported after the root cause of the accumulation has been resolved. Typically this would occur due to an outage of a data storage endpoint (like Exchange), or could be due to a sustained period of load
32080 LS Storage Service A queue flush operation has encountered a file error.

Preliminary primary fileShareName parameter: \\contoso.com\LyncFileShare\1-WebServices-1\StorageService is unusable. Exception: System.IO.DirectoryNotFoundException: Failed to get DirectoryInfo of \\contoso.com\LyncFileShare\1-WebServices-1\StorageService at Microsoft.Rtc.Internal.Storage.Sql.LyssDal.ValidateFileShareName(StoreContext ctx, String fileShareName, String timestamp, LyssDBUsageStatus usageLevel, Boolean isTenantMigration)

Cause: There may be permission issues to the file share, local file location, temporary directory, or disk is full.

Resolution:

Please check event detail and trace log for more information. Please ensure there is write permission to required file locations.


References:
Import Storage Service Data
Archiving Options in Lync Server 2013
The LCSLog SQL Database is not logging any archiving content
Understanding Monitoring and Archiving on Lync Server 2013

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.