Troubleshooting Hybrid Migration Endpoints in Classic and Modern Hybrid

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

In our previous blog post we covered an overview of what migration endpoints are, how to find them and what makes them tick. In this post, we will cover related troubleshooting. Note that this post has some in-depth troubleshooting steps, so it is not necessarily something that you’ll read for fun, but we wanted to make it available for those times when you run into trouble!

Now let’s cover some troubleshooting!

Things that are commonly mis-configured

Before getting to the part where we troubleshoot specific migration endpoint issues, be aware of the following top reasons why a migration endpoint cannot be created:

 

Troubleshooting

Whether you are having trouble creating new migration endpoints in Office 365 Exchange Online or are not able to migrate anymore to or from Exchange Online using an existing migration endpoint, the cmdlet Test-MigrationServerAvailability is your dear friend. Always run this cmdlet in Exchange Online PowerShell, not from on-premises Exchange Management Shell.

We will focus on hybrid migration endpoint issues and the cmdlet syntax for this endpoint type. Below we will show you 3 commands that can help you check for underlying issues or error messages:

1. Hybrid remote move endpoint with Autodiscover

 

Test-MigrationServerAvailability -ExchangeRemoteMove -Autodiscover -EmailAddress user@contoso.com -Credentials (get-credential contoso\administrator)

 

 

troublhybmig02.jpg

Note: The option with Autodiscover is not used in Modern hybrid as we go directly to EWS server(s) for both Migration Endpoints and Free/Busy configuration (Cloud Intra-Organization Connectors and Organization Relationships have TargetSharingEpr set to the EWS namespace.)

2. Hybrid remote move endpoint without Autodiscover (testing EWS directly)

 

Test-MigrationServerAvailability -ExchangeRemoteMove –RemoteServer mail.contoso.com -Credentials(get-credential contoso\administrator)

 

 

troublhybmig03.jpg

Note: In Modern hybrid, the RemoteServer is in the format of '<GUID>.resource.mailboxmigration.his.msappproxy.net', where <GUID> is unique for each organization. This is randomly generated and stored encoded base 64 in the OnPremisesOrganization object’s Comment in Exchange Online when you will run Modern HCW and it gets to that configuration part.  This GUID will be stamped on the Migration Endpoint in the RemoteServer value for both Full and Minimal Modern Hybrid Topologies and also on the TargetSharingEpr values for Cloud Intra-Organization Connector / Organization Relationship . You can check the GUID in the HCW log and on Get-MigrationEndpoint / Get-IntraOrganizationConnector / Get-OrganizationRelationship EXO cmdlets.

troublhybmig04.jpg

Cmdlets ran above:

 

Get-IntraOrganizationConnector | fl targetsharingepr Get-OrganizationRelationship | fl targetsharingepr Get-MigrationEndpoint $strdata = (Get-OnPremisesOrganization).comment $bytes = [Convert]::FromBase64String($strdata) $ms = New-Object System.IO.MemoryStream(@(,$bytes)) $deflate = New-Object System.IO.Compression.DeflateStream($ms, [System.IO.Compression.CompressionMode]::Decompress) $reader = New-Object System.IO.StreamReader($deflate) $text = $reader.ReadToEnd() $text

 

 

3. Testing an existing hybrid remote move endpoint

 

Test-MigrationServerAvailability -Endpoint <Identity of the Endpoint>

 

 

troublhybmig05.jpg

OK, I ran these and found errors; now what?

Let’s walk through some common Test-MigrationServerAvailability errors and how to troubleshoot these in Classic / Modern hybrid.
Tip: Whenever a command returns an error in PowerShell, you should run the command $Error[0].Exception |fl -f to get more details on the exception thrown.

However, test-migrationserveravailability failures are not actually those red errors that you get when a command does not work in PowerShell. So, in this situation, we can use New-MoveRequest command to throw the same error and get the serialized exception from this one.
Test-MigrationServerAvailability which returns the result:

troublhybmig06.jpg

Running the New-MoveRequest command to test migration of a synced user will give me same error message but in “red” and we can then get the serialized exception from it.

troublhybmig07.jpg

Based on these error messages, we gathered most common scenarios.

Scenario 1: Test-MigrationServerAvailability fails because of connectivity / timeout errors

Examples of some of those errors:

 

The call to 'https://<GUID>.resource.mailboxmigration.his.msappproxy.net/EWS/mrsproxy.svc' timed out. Error details: The request channel timed out attempting to send after 00:00:00. The time allotted to this operation may have been a portion of a longer timeout. The call to 'https://mail.contoso.com/EWS/mrsproxy.svc' timed out. Error details: The operation did not complete within the allotted timeout of 00:00:50. The time allotted to this operation may have been a portion of a longer timeout. The request channel timed out while waiting for a reply after 00:00:09.9996191. The time allotted to this operation may have been a portion of a longer timeout. The remote server returned an error: (504) Gateway Timeout.

 

 

Troubleshooting these timeout errors in Modern hybrid:


During the Modern hybrid configuration, you will be asked to input the credentials for the on-premises migration admin – these can be the same credentials inserted in the beginning of HCW or new ones. The on-premises Migration admin credentials are needed by HCW in order to run the Test-MigrationServerAvailability cmdlet and this account can have lesser privileges than the admin account running HCW (which is Organization Management role). Once Modern HCW has tested the on-premises migration server availability, we will know if we are able to create the migration endpoint in Exchange Online or not and also in this step we would see the dynamically generated GUID for your on-premises MRSProxy server(s): https://<GUID>.resource.mailboxmigration.his.msappproxy.net/EWS/mrsproxy.svc.
Supposing that you get a timeout error in HCW when testing migration server availability, you would first search in the HCW log for the Test-MigrationServerAvailability cmdlet executed by HCW, copy-paste the exact command to a Notepad file (or at least make a note of the RemoteServer value) and then connect to Exchange Online PowerShell and run the same command to confirm the error message thrown by HCW.

Connect to Exchange Online PowerShell using an O365 Global Admin account, you can use for example this one line command:

 

Import-PSSession $(New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid/ -Authentication Basic –AllowRedirection -Credential $(Get-Credential))

 

 

Once you are successfully connected to EXO PS, run the same command that HCW ran in the log file to see if you get same error or different one, providing the credentials for the on-premises migration admin and your RemoteServer value:

 

Test-MigrationServerAvailability -ExchangeRemoteMove: $true -RemoteServer '<GUID>.resource.mailboxmigration.his.msappproxy.net' -Credentials (Get-Credential domain\admin)

 

 

Most likely you will get same result as HCW did when you will run the same command in PowerShell. Although this step may seem redundant, it is always a good idea to analyze the command ran by HCW (if you selected Classic Hybrid option during HCW, then HCW should not try to create the migration endpoint to <GUID>.resource.mailboxmigration.his.msappproxy.net which is for Modern Hybrid topology) and confirm at the same time that the error was not for example a transient one.

Once you confirm that you still get the error that HCW was complaining about and you are running Modern Hybrid Mode, you would need to investigate it.

On the server where you ran the Modern HCW, import the cmdlets: Import-Module .\HybridManagement.psm1  from \Program Files\Microsoft Hybrid Service\ directory as described here and then run below command in order to see the Hybrid Agent Status (if active or inactive):

 

Get-HybridAgent -credential (get-credential) # cloud admin credentials

 

 

Check that the Hybrid Agent(s) is ACTIVE

If INACTIVE, you would check:

  • If you switched from Modern hybrid to Classic hybrid (as this would uninstall the agent). If that is the case – you proceeded with Classic hybrid topology and this successfully uninstalled the Hybrid Agent, then you can ignore this warning thrown by HCW related to the migration endpoint for <GUID>.resource.mailboxmigration.his.msappproxy.net and you should create the migration endpoint in EAC using Autodiscover or your published EWS URL. At the moment of the writing of this blog post, if you switch from Modern to Classic successfully – HCW still tries to create the migration endpoint going to the Hybrid Agent proxy instead of using your published EWS URL.
  • If Hybrid Service is installed on the machine and is up and running and the Hybrid Agent machine itself is running
  • Check all things from Additional Information here to see if the Agent is installed properly

Most important, if Hybrid Agent is ACTIVE but you get the error ‘unable to connect to the server’ in Test-MigrationServerAvailability, check and confirm with Performance Monitor that you see the requests.
If the request counters (for #of requests) go up on the Agent machine when you do Test-MigrationServerAvailability to the Hybrid Agent, the problem is likely on the on-premises server; if they don't, the problem is probably with either the connector or the cloud configuration. If we suspect on-premises (most likely the case), you need to check again the on-premises infrastructure, especially proxy and firewall settings. Several things that might be helpful: install requirements, system requirements, port and protocol requirements.

 

Troubleshooting these errors in Classic hybrid:

 

Check your network devices logs and IIS logs / HTTPProxy logs at the time your run Test-MigrationServerAvailability command, usually if the timeout happens very quick (under 50 sec) it could probably be a network device that is blocking / closing the connection.

Location of IIS and HTTPProxy logs:

  • IIS logs Default Web Site (DWS): %SystemDrive%\inetpub\logs\LogFiles\W3SVC1

Example: C:\inetpub\logs\LogFiles\W3SVC1

The name of the IIS logs contains the date of the log, for example u_ex190930.log is from Sept 30, 2019.

  • HTTPProxy logs for EWS:  %ExchangeInstallPath%Logging\HttpProxy\Ews

Example: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\Ews

The name of the HTTPProxy logs contains the date and hour starting to log, for example HttpProxy_2019093014-10.LOG (10th log from Sept 30, 2019, starting hour 14:00 UTC)

Below we have some examples of entries from IIS logs when we get a successful request (200 OK) and a failed request (500) that could correspond with a timeout error in Test-MigrationServerAvailability if the request reached IIS / Exchange Server.

 

IIS logs - 200 OK 2019-08-28 06:57:38 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 0 0 0 2019-08-28 06:57:42 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 1 2148074254 15 2019-08-28 06:57:42 192.168.2.50 POST /EWS/mrsproxy.svc - 443 contoso\administrator 4.4.0.1 - 200 0 0 125 2019-08-28 06:57:42 192.168.2.50 POST /EWS/mrsproxy.svc - 443 contoso\administrator 4.4.0.1 - 200 0 0 296 IIS logs - 500 error 2019-08-28 07:15:48 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 2 5 4890 2019-08-28 07:15:52 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 1 2148074254 0 2019-08-28 07:15:55 192.168.2.50 POST /EWS/mrsproxy.svc - 443 contoso\administrator 4.4.0.1 - 500 0 0 2562 2019-08-28 07:15:55 192.168.2.50 POST /EWS/mrsproxy.svc - 443 contoso\administrator 4.4.0.1 - 500 0 0 93

 

 

If you don’t see the failed requests in IIS logs, make sure you allow all Exchange Online IP addresses to connect to your on-premises and check firewall logs to see if connections were blocked.

This is an extract from HTTP proxy logs with a 500 error code and a timeout when proxying to another Exchange server:

 

2019-09-30T12:02:55.930Z,a0ff365c-741b-4e59-b2e3-760991d3a27a,15,1,1713,5,,Ews,exch01.contoso.com,/EWS/mrsproxy.svc,,Negotiate,true,contoso\admin,,OrganizationId~OrganizationAnchor@,,40.100.175.55,exch01,500,,ServerLocatorError,POST,,,,,ForestWideOrganization,,,2807,664,,,,1,2819,0;,0,,0,8;2;,10,10,,0,2821,0,,,,,,,,,0,2819,0,,2819,,2820,2820,,,,BeginRequest=2019-09-30T12:02:53.109Z;CorrelationID=<empty>;ProxyState-Run=None;ServerLocatorCall=DM:a10ad628-e020-409e-9f1e-22a700182ac1~~contoso.structure;ProxyState-Complete=CalculateBackEnd;SharedCacheGuard=0;EndRequest=2019-09-30T12:02:55.930Z;S:ServiceCommonMetadata.Cookie=ee4af79a0a144bcaa9a5bc0af6eec215;I32:ADR.C[DC04]=1;F:ADR.AL[DC04]=1.554922;I32:ATE.C[DC04.contoso.local]=2;F:ATE.AL[DC04.contoso.local]=1;I32:ADS.C[DC04]=1;F:ADS.AL[DC04]=2.897117,HttpProxyException=Microsoft.Exchange.HttpProxy.HttpProxyException: Server Locator Service call had a communication error. ---> Microsoft.Exchange.Data.ServerLocator.ServerLocatorClientTransientException: Server Locator Service call had a communication error. ---> System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://exch02.contoso.local:64337/Exchange.HighAvailability/ServerLocator. The connection attempt lasted for a time span of 00:00:02.8010812. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond 10.2.2.1::64337. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond 10.2.2.1::64337 at System.Net.Sockets.Socket.InternalEndConnect(IAsyncResult asyncResult) at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult) at System.ServiceModel.Channels.SocketConnectionInitiator.ConnectAsyncResult.OnConnect(IAsyncResult result) --- End of inner exception stack trace ---

 

Scenario 2: Test-MigrationServerAvailability fails with 403 Forbidden

 

 

The connection to the server '<GUID>.resource.mailboxmigration.his.msappproxy.net' could not be completed., The call to 'https://<GUID>.resource.mailboxmigration.his.msappproxy.net/EWS/mrsproxy.svc' failed. Error details: The HTTP request was forbidden with client authentication scheme 'Negotiate'. --> The remote server returned an error: (403) Forbidden.., The HTTP request was forbidden with client authentication scheme 'Negotiate'., The remote server returned an error: (403) Forbidden.

 

 

Follow this article, as it is applicable for both Modern and Classic hybrid topologies.

Scenario 3: Test-MigrationServerAvailability fails with 401 Unauthorized, Access denied or Invalid credentials

Check this support article.

Check authentication methods on all EWS virtual directories in IIS and make sure Negotiate provider under Windows Authentication is enabled for all EWS.

Make sure on-premises migration Admin has at least Exchange Recipient Admin permissions (or Recipient Management, depending on the Exchange version). If you are running Modern HCW, usually this is the same on-premises account that has Organization Management rights, but if you change the account, you would see here if the permissions are right or the credentials are correct when we would input account credentials. We recommend that you create (do not copy account) another on-premises account that purely has Exchange Recipient Admin permissions, then test with that account (Test-MigrationServerAvailability):

troublhybmig08.jpg

Do you have devices that pre-authenticate the requests coming from Exchange Online to Exchange on-premises servers (EWS and Autodiscover paths)? If yes, this is not supported. If you are not sure of it, we recommend you temporarily bypass network devices in front of Exchange Servers, allow direct access to Exchange Servers and see if same error when doing Test-MigrationServerAvailability

When you run Test-MigrationServerAvailability, make a note of the timestamp when you get the error, then check IIS logs on each Exchange Client Access Server (logs are UTC timezone) around the exact time when Test-MigrationServerAvailability has been ran (HH:MM:SS) and check entries for mrsproxy.svc and see the statuses mentioned. Normally, there following are the first 3 IIS entries for 1 successful request (Test-MigrationServerAvailability):

 

Successful 2019-08-28 06:57:38 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 0 0 0 2019-08-28 06:57:42 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 1 2148074254 15 2019-08-28 06:57:42 192.168.2.50 POST /EWS/mrsproxy.svc - 443 miry\administrator 4.4.0.1 - 200 0 0 125

 

Some other error examples:

 

Issue: wrong credentials of migration admin

 

 

Message : The connection to the server 'mail.contoso.com' could not be completed. ErrorDetail : Microsoft.Exchange.MailboxReplicationService.RemotePermanentException: The Mailbox Replication Service was unable to connect to the remote server using the credentials provided. Please check the credentials and try again. The call to 'https://mail.contoso.com/EWS/mrsproxy.svc' failed. Error details: The HTTP request is unauthorized with client authentication scheme 'Negotiate'. The authentication header received from the server was 'Negotiate,NTLM'. --> The remote server returned an error: (401) Unauthorized.

 

 

 

IIS Logs --------- 2019-08-28 07:19:05 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 0 0 2390 2019-08-28 07:19:10 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 1 2148074254 0 2019-08-28 07:19:10 192.168.2.50 POST /EWS/mrsproxy.svc - 443 - 4.4.0.1 - 401 1 2148074252 46

 

 

Issue: authentication scheme mismatch (EXO requires Negotiate / NTLM, on-premises gives us Basic only)

 

Message : The connection to the server 'mail.contoso.com' could not be completed ErrorDetail : Microsoft.Exchange.MailboxReplicationService.RemotePermanentException The Mailbox Replication Service was unable to connect to the remote server using the credentials provided. Please check the credentials and try again. The call to 'https://mail.contoso.com/EWS/mrsproxy.svc failed. Error details: The HTTP request is unauthorized with client authentication scheme 'Negotiate'. The authentication header received from the server was 'Basic Realm="mail.contoso.com"'. --> The remote server returned an error: (401) Unauthorized.

 

 

First, make sure that NTLM is enabled on the EWS virtual directory. You would check with Get-WebServicesVirtualDirectory |FL cmdlet if NTLM is present in the Authentication Methods. You should also double check in IIS Manager, to make sure that the Negotiate provider is present as well under Windows Authentication. Negotiate provider is falling back to NTLM if Kerberos is not possible and with external clients that is always the case. So, Exchange Online MRS requires NTLM inside the Negotiate Provider on the EWS on-premises virtual directory.

If you are using Azure AD App Proxy as Reverse Proxy for your MRSProxy servers, then you should be aware that there is a limitation of AADAP that cannot present both Negotiate and NTLM providers in the WWW-Authenticate header, no matter the order of the providers in IIS. If you have this setup, you can remove NTLM provider from Windows Authentication on EWS in IIS Manager (leave only Negotiate provider that does also NTLM) or bypass AADAP.
If that is fine (configuration on Exchange and IIS side), then you would check the publishing rules for EWS on your firewall/reverse proxy.

You can also use this mini PowerShell script to check the Authentication Methods advertised by your on-premises (replace the URL with your on-premises MRSProxy /EWS namespace).  It is also a good practice to check the WWW-Authenticate headers both from external and internal PC and notice if there are differences in the output (for example you see Basic, Negotiate, NTLM from internal – all ok but you see only Basic from external – not ok).

 

$req = [System.Net.HttpWebRequest]::Create("https://mail.contoso.com/ews/MRSProxy.svc") $req.UseDefaultCredentials = $false $req.GetResponse() # Expected error: Exception calling "GetResponse" with "0" argument(s): # "The remote server returned an error: (401) Unauthorized." $ex = $error[0].Exception $resp = $ex.InnerException.Response $resp.Headers["WWW-Authenticate"]

 

Scenario 4: Test-MigrationServerAvailability fails with SSL / TLS error

 

Microsoft.Exchange.Migration.MigrationServerConnectionFailedException: The connection to the server'hybrid.contoso.com' could not be completed. ---> Microsoft.Exchange.MailboxReplicationService.MRSRemotePermanentException: The Mailbox Replication Service could not connect to the remote server because the certificate is invalid. The call to 'https://hybrid.contoso.com/EWS/mrsproxy.svc' failed. Error details: Could not establish trust relationship for the SSL/TLS secure channel with authority 'hybrid.contoso.com'. -->The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. --> The remote certificate is invalid according to the validation procedure.

 

 

Whenever you see SSL/TLS errors, you would check the following:

 

TLS1.2 should be enabled in the on-premises infrastructure

For Classic hybrid, you can use this PowerShell command while logged to your Office 365 Exchange Online tenant to test the network request on TLS1.2 protocol and see if you also get a SSL/TLS error here:
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12;Invoke-WebRequest https://<endpoint FQDN>/ews/mrsproxy.svc -Verbose

 

For Modern Hybrid, you can check Test-HybridConnectivity which also checks for TLS1.2 and this is mandatory step when configuring / installing the Hybrid Agent. See Verifying Connectivity section from here.

 

Make sure you are not doing SSL offloading for MRSProxy.svc

For Classic hybrid, a valid 3rd party valid certificate is required for EWS. See this and this.

You can use the following command in Exchange Management Shell to quickly check the Exchange certificates and some properties on them from all your Exchange servers in the organization:

 

 

Foreach ($i in (Get-ExchangeServer)) {Write-Host $i.FQDN; Get-ExchangeCertificate -Server $i.Identity | FT Thumbprint, Status, RootCAtype, Services, Subject}

 

 

Also, you should be able to access the CRLs for the certificate.

Scenario 5: Test-MigrationServerAvailability fails with 503 Service Unavailable

 

The call to 'https://<GUID> resource.mailboxmigration.his.msappproxy.net/EWS/mrsproxy.svc' failed. Error details: The HTTP service located at https://<GUID>.resource.mailboxmigration.his.msappproxy.net/EWS/mrsproxy.svc is unavailable. This could be because the service is too busy or because no endpoint was found listening at the specified address. Please ensure that the address is correct and try accessing the service again later. --> The remote server returned an error: (503) Server Unavailable

 

 

For this error, best is to check HTTPProxy logs for EWS and see if it tried to proxy to a server that is unavailable, for example Server Wide Offline in Get-ServerComponentState or maybe EWS application pool is not started or MRS service is stopped. One thing to add here is that, if you have Exchange 2013/2016 in coexistence with Exchange 2010, the Exchange 2013/ 2016 servers will proxy to the same version server (2013/2016) and will not proxy down to Exchange 2010.

This brings us to the end of this post! Hope you find this useful if you want to learn a bit more about how migration endpoints work, or you need to troubleshoot this area.

I wanted to thank Brad Hughes, Jason Nelson, Nino Bilic and Greg Taylor for their review of this post.

Mirela Buruiana

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.