FCI failed to do failover because of ReAclDirectory : Failed to apply security when applying patch

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Customer was trying to apply SQL2014 SP3 in Failover cluster instance. They applied patch to passive node, then failover to that node. Failover failed. Issue occurred every time. They can mitigate failover issues by uninstalling SQL2014 SP3.  According to cluster logs, we can see:

 

00005f04.00005968::2022/10/07-00:06:42.743 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Request to bring SQL Server online

00005f04.00005968::2022/10/07-00:06:42.743 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceFailed' to 'ClusterResourceOnlinePending'

00001554.00003fa4::2022/10/07-00:06:42.745 INFO [RCM] HandleMonitorReply: ONLINERESOURCE for 'SQL Server (CAxxxxxDB)', gen(11) result 997/0.

00001554.00003fa4::2022/10/07-00:06:42.745 INFO [RCM] Res SQL Server (CAxxxxxDB): OnlineCallIssued -> OnlinePending( StateUnknown )

00001554.00003fa4::2022/10/07-00:06:42.745 INFO [RCM] TransitionToState(SQL Server (CAxxxxxDB)) OnlineCallIssued-->OnlinePending.

00005f04.00001d14::2022/10/07-00:06:42.745 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Online worker is started

00001554.00001db8::2022/10/07-00:06:42.745 INFO [GEM] Node 2: Processing message as part of GemRepair message 2:35071 from node 2. Action: causal, Target: CAUS

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] XEvent session CAxxxxxDB is created with RolloverCount 10, MaxFileSizeInMBytes 100, and LogPath 'C:\ClusterStorage\VirtualDisk-CAxxxxxDB\Data\MSSQL13.CAxxxxxDB\MSSQL\LOG\'

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Extended Event logging is started

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property VerboseLogging is 0

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property HealthCheckTimeout is 60000

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property FailureConditionLevel is 3

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property SqlDumperDumpFlags is 0x0

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property SqlDumperDumpTimeOut is 0

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The private property SqlDumperDumpPath is ''

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The property LogIsEnabled is 1

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The property LogFileRolloverCount is 10

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The property LogMaxFileSizeInMBytes is 100

00005f04.00001d14::2022/10/07-00:06:42.831 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] The property LogPath is ''

00005f04.00001d14::2022/10/07-00:06:42.833 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Server name is GOAAZRVDB226\CAxxxxxDB

00005f04.00001d14::2022/10/07-00:06:42.833 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Service name is MSSQL$CAxxxxxDB

00005f04.00001d14::2022/10/07-00:06:42.833 INFO [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Dependency expression for resource 'SQL Network Name (xxxxxx)' is '([5bxxxxf4-3e0d-4787-9e65-769xxxxx68])'

00005f04.00006a48::2022/10/07-00:06:42.835 ERR [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Worker Thread (43A1E9F0): Failed to retrieve the ftdata root registry value (hr = 2147942402, last error = 0). Full-text upgrade will be skipped.

00005f04.00006a48::2022/10/07-00:06:42.927 WARN [RES] SQL Server <SQL Server (CAxxxxxDB)>: [sqsrvres] Worker Thread (43A1E9F0): ReAclDirectory : Failed to apply security to C:\ClusterStorage\VirtualDisk-CAxxxxxDB\Data\MSSQL13.CAxxxxxDB\MSSQL\Data (50).

 

We checked registry keys which might point to a wrong path to cause this issue. But we don't find any wrong registry key.

 

We captured rhs.exe dump to analyze this issue. According to the dump, it pointed to SQL Error log folder.

 

00 ntdll!ZwSetSecurityObject

01 KERNELBASE!SetKernelObjectSecurity

02 ntmarta!MartaSetFileRights

03 ntmarta!MartaUpdateTree

04 ntmarta!MartaManualPropagation

05 ntmarta!AccRewriteSetHandleRights

06 advapi32!SetSecurityInfo

07 SQSRVRES!SQLClusterSharedDataUpgradeWorker::ReAclDirectory

08 SQSRVRES!SQLClusterSharedDataUpgradeWorker::DoSQLDataRootApplyACL

09 SQSRVRES!SQLClusterSharedDataUpgradeWorker::Execute

0a SQSRVRES!SQLClusterResourceWorker::WorkerStartRoutine

0b resutils!ClusWorkerStart

0c kernel32!BaseThreadInitThunk

0d ntdll!RtlUserThreadStart

 

wchar_t * directory = 0x00000000`01bdf130 "D:\MSSQL11.MSSQLSERVER\MSSQL\Log"

 

We checked SQL Error logs folder. This folder contains 48000+ files. There appears to be a timeout limit, and if there are too many files in the folder, we hit the timeout and the failover fails.

 

Customer cleaned up 'Maintenance Plan logs', 'Rebuild index logs', 'Old Dump files' where are not useful. They reduced files number to 300. Failover was successful afterwards.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.