Known Issue for Service Fabric Windows Server 2016 Clusters

This post has been republished via RSS; it originally appeared at: MSDN Blogs.

We have become aware of a Windows Defender regression that can block Service Fabric cluster upgrades. 

Certain Windows Defender signatures contain a regression that can potentially hang Windows Task Scheduler blocking Windows applications that rely on Task Scheduler. You can learn more about this regression in this post. 

Impacted Customers: 

This regression will only impact Service Fabric cluster upgrades for customers running Windows Server 2016. Other OS images and Service Application upgrades are not affected as part of this regression. 

On a machine that is impacted by the regression, Windows Task scheduler would be in hung state. Any type of Service Fabric cluster upgrade would take down the Service Fabric node hosted on the machine permanently until an OS reboot / reimage happens. 

Signatures that contain the regression have been rolled out broadly to most Azure machines running Windows Server 2016 + Windows Defender with a standard Defender signature update config. It is likely that some Windows Server 2016 machines Task Scheduler are already in hung state. If Task Scheduler is in hung state, any Service Fabric cluster upgrade could bring down Service Fabric nodes and roll back due to node health policy violations.  

Mitigation Steps: 

If you started a Service Fabric cluster upgrade which brought down nodes permanently post rollback. Please follow the steps below: 

  1. On machines hosting down nodes, Use %systemroot%System32SchTasks.exe to verify if the Task Scheduler is in hung state. The command is supposed to return the list of scheduled tasks, if it does not return, then Task Scheduler is hung. 
  2. Reboot the machines and check if the nodes come back. If they don’t, reimage the machines. 
  3. If you cannot even RDP into the machines, directly reimage them. 

SF runtime 6.3 support will be extended to March 31st, 2019. 

Please reach out to us if you need specific assistance with this issue through Azure Portal Help. In addition, here are your general support options for Service Fabric: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-support#report-production-issues-or-request-paid-support-for-azure. We will keep you appraised of any updates on this matter through the blog. Thank you! 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.