Extending SQL Server Migration Assistant’s conversion capabilities

This post has been republished via RSS; it originally appeared at: Microsoft Data Migration articles.

 

With the SSMA 7.3 release, we exposed some of SSMA's conversion functionality to the public in order to allow end-users to extend the conversion capabilities of the tool. I will try to cover some parts of it in a series of blog posts.

 

Before we go into details on how to implement a custom conversion, I wanted to share some details about how SSMA internals work and explain how schema conversion is done. This will help us to better understand which parts of the conversion pipeline can be extended in this release.

 

In SSMA conversion consists of these 3 basic steps:

 

  1. Parse source object definition into source-specific syntax tree (for example, PL/SQL header and body of Oracle’s stored procedure).
  2. Convert source-specific syntax tree into SQL Server specific syntax tree.
  3. Format SQL Server specific syntax tree to the T-SQL statements.

The output of step 3 can be executed directly against SQL Server (assuming there were no conversion errors) and you will get converted schema on the target.

 

Internally, SSMA stores all syntax trees in an XML-like data structure which are represented by a hierarchy of XNodes. If you are familiar with XLINQ, then concept of XDocument and XNode should not be new to you, but due to historical reasons  SSMA has its own implementations of these classes defined in Microsoft.SSMA.Framework.Generic.XTree namespace.

 

So, the output of step 1 is a set of XNodes and it’s being passed to step 2, which performs the actual schema conversion. Under the hood, it is implemented by a set of so-called Node Converters. The input for the Node Converter is an XNode and output is another XNode that represents SQL Server specific syntax node, but more on this later. After the conversion is done, the resulting  set of XNodes is being formatted to T-SQL by the set of Node Formatters (step 3) .

 

Having established all these basic concepts of schema conversion, it is time now to discuss what can be improved externally in this release. Starting from SSMA 7.3 all of existing Node Converter implementations are made public, but that change alone will not give you any real benefit, so on top of that there is a mechanism now that allows you to inject your custom Node Converter into the SSMA conversion pipeline through Microsoft Extensibility Framework (aka MEF) .

 

In this blog post I will demonstrate how one can create their own Node Converter and integrate it into SSMA’s schema conversion step. For this example, I picked one of the conversions that is not yet supported by SSMA, but fairly easy to implement for a limited set of cases – OVER clause with simple window functions, like this example for AdventureWorks:

 

SELECT FIRST_VALUE(ProductName) OVER (ORDER BY ProductID) AS FirstProduct

FROM Sales.SalesOrderDetail;

 

It is worth saying that SSMA supports custom Node Converters for all available source platforms, but the sake of simplicity, let’s consider Oracle as a source for our sample conversion. The reason why I decided to pick Oracle is because it does support windowing functions in a very similar way, so we can focus more on the SSMA conversion infrastructure, rather than on the actual conversion logic implementation.

 

First thing you’ll need to do is create a new empty Class Library project in Visual Studio, targeting .NET Framework 4.   After the project is created, you will need to add references to SSMA extensibility assemblies. All required SSMA binaries will be in the installation folder (typically it is C:\Program Files (x86)\Microsoft SQL Server Migration Assistant for Oracle\bin), you will need to add the following to your project:

 

  • Microsoft.SSMA.Framework.Generic.Constants
  • Microsoft.SSMA.Framework.Generic.Converters.Default
  • Microsoft.SSMA.Framework.Generic.Factories
  • Microsoft.SSMA.Framework.Generic.SqlBase.Default
  • Microsoft.SSMA.Framework.Generic.SqlConstants
  • Microsoft.SSMA.Framework.Generic.XTree
  • Microsoft.SSMA.Framework.Oracle.Constants
  • Microsoft.SSMA.Framework.Oracle.Generic
  • Microsoft.SSMA.Framework.Oracle.SqlServer
  • Microsoft.SSMA.Framework.SqlServer.Constants

 

Lastly, you need to add reference to System.ComponentModel.Composition to be able to use MEF.

 

When all the required references are set, add new class to the project – this will be our custom Node Converter implementation, let’s call it CustomOverExpressionConverter.

 

In the newly created class file add the following namespace references:

 

using Microsoft.SSMA.Framework.Generic.Converters;

using Microsoft.SSMA.Framework.Generic.Converters.Default;

using Microsoft.SSMA.Framework.Generic.XTree;

using Microsoft.SSMA.Framework.Oracle.Constants;

using Microsoft.SSMA.Framework.Oracle.Generic;

using Microsoft.SSMA.Framework.Oracle.SqlServerConverter.NodeConverters;

using Microsoft.SSMA.Framework.SqlServer.Constants; using System.ComponentModel.Composition;

 

All Node Converters should implement INodeConverter interface, but for every source platform we provide a base class that already implements all required basic functionality, so that you can focus on the actual conversion logic. For Oracle source this class is called O2SSNodeConverter.

 

Note: “O2SS” is a commonly used abbreviation in the SSMA, so it’s worth it to familiarize yourself with it. For other sources this will be “A2SS” for Access, “M2SS” for MySQL, “S2SS” for Sybase and “Db2Mf2SS” for DB2.

 

For SSMA to be able to discover our converter, we need to export it through MEF. To do so, add the following attribute to the class: Export(typeof(INodeConverter)). After SSMA finds your converter, it needs to know which source XNode it actually converts. To provide this extra piece of information to SSMA we will need to add one more attribute to our class:NodeConverter(OracleParserConstants.NodeNames.OVER_EXPRESSION). Argument for this attribute is the string name of the XNode that this Node Converter should be applied to. This is where it gets a little bit more complicated, without knowing all of the SSMA internals. Since syntax tree is represented as a tree, to perform the conversion SSMA traverses the tree and looks up the converter for every node on its way. Once it finds the converter for the node – it passes XNode to that converter and it’s up to the converter to convert the node itself and all its children. Most of the converters provide some conversion logic specific to the node in question and leverage other converters to perform the conversion for any fundamental constructs they need to. Consider this example:

 

SELECT MAX(Age)

FROM Persons

GROUP BY LastName

 

This query can be represented with the following abstract syntax tree:

 

Select-statement

 

- Column-list

- - Identifier (MAX)

- - - Identifier-arguments

- - - - Identifier (Age)

- From-clause

- - Identifier (Persons)

- Group-by-clause

 

- - Identifier (LastName)

 

You can see that there are multiple “Identifier” nodes in the tree, which are not specific to any of the parent nodes. In this case, “Select-statement” Node Converter can rely on other converters (e.g. identifier converter) to convert parts of its own subtree. SSMA already provides lots of Node Converters for multiple different fundamental constructs. In fact, “From-clause” and “Group-by-clause” both have their own separate converters, rather than being converted as part of the “Select-statement” converter. Now getting back to our converter, we have to specify which node our converter should be applied to, when SSMA traverses the tree. This may not be a trivial task, as you do not have context as to how the parser works, but you may be able to get some clues by looking at the list of all the nodes that exist in SSMA’s universe. To do so you can use IntelliSense to look at members of the OracleParserConstants.NodeNames class. In our case we will be targeting “over-expression” node and it has a corresponding constant called OracleParserConstants.NodeNames.OVER_EXPRESSION. This is what parameter of the NodeConverterAttribute constructor represents.

 

By now we should have a class declaration similar to this:

 

[Export(typeof(INodeConverter))]

[NodeConverter(OracleParserConstants.NodeNames.OVER_EXPRESSION)]

public class CustomOverExpressionConverter : O2SSNodeConverter

 

If you are familiar with the SSMA conversion capabilities, you may know that it already supports some of the window functions. This means that Node Converter for “over-expression” node already exists. The problem is that it does not support some of the functions that have direct analogy in newer version of SQL Server, FIRST_VALUE that we want to target in our example is one of those unsupported ones. Since SSMA is only able to use one Node Converter per node, we will have to make sure that original conversion logic is preserved for all other functions as part of our implementation. For this, lets define a private field that will hold the original Node Converter for “over-expression”, so that we can call into it, when needed.

 

private readonly INodeConverter originalOverExpressionConverter;

 

We now need to initialize a new instance of the built-in OverExpressionConverter, but once you try to do so, you will realize that it requires something called DefaultDocumentConverter to be passed to its constructor. This is the parent converter that allows you to access other converters from within yours and as you may guess – it is also required by the base class that we used for our converter (O2SSNodeConverter). Luckily you don’t have to worry about getting an instance of the parent document converter, as SSMA infrastructure will provide you one. All we need to do is implement the constructor that accepts the DefaultDocumentConverter as an argument and mark it with the ImportingConstructor MEF attribute. Then we can initialize our original “over-expression” Node Converter with the given instance, as following:

 

public CustomOverExpressionConverter(DefaultDocumentConverter documentConverter)

    : base(documentConverter)

{

    originalOverExpressionConverter = new OverExpressionConverter(documentConverter);

}

 

Now we got to the point where we will implement our custom converter logic. All we need to do is override ConvertNode method of the base class:

 

public override XNode ConvertNode(XNode sourceNode, XNode targetParent, IConversionContext context)

 

But before we jump into the conversion logic, let’s try to better understand how over-expression node looks like in a real world. Typically, OVER expression will have two parts to it, our sample query (FIRST_VALUE(ProductName) OVER (ORDER BY ProductID)) may be represented with the following abstract syntax tree:

 

Over-expression

 

- Identifier (FIRST_VALUE(ProductName))

- - Identifier-params (ProductName)

- Over-expression-params (ORDER BY ProductID)

- - Order-by-clause

 

- - - ...

 

For our sample converter, lets target this variation of the over-expression node. First thing we would do in our conversion implementation is check whether it is something that we can convert.

 

// Typical over-expression node will have two children: aggregate function identifier and over-expression parameters.

 

if (sourceNode.Children.Count == 2)

{

    // Get the function identifier node

    var sourceFunctionIdentifierNode = sourceNode.Children[0];

 

    // Extract function name

    var sourceFunctionCompoundName = OracleNameProcessor.Instance.GetNameFromIdentifier(sourceFunctionIdentifierNode);

    

    // Check, if function identifier is a simple one-part identifier and is suitable for our conversion

    if (sourceFunctionCompoundName.Count == 1 && sourceFunctionCompoundName.Last  .NormalName == "FIRST_VALUE")

    {

    ...

 

At this point we know that our source node had two children where first one was an identifier for the "FIRST_VALUE" window function. We can happily proceed to the actual conversion now. Thanks to existing fundamental converters that are already implemented in the SSMA – our conversion logic will be pretty straight forward: all we need to do is create new SQL Server specific "over-expression" node under provided targetParent and convert both children nodes:

 

// Create over-expression node in the target tree

var targetOverExpressionNode = targetParent.Children.AddNew(SqlServerParserConstants.NodeNames.OVER_EXPRESSION);

 

Since our first child represents a function identifier that is not known to SSMA (otherwise it would be already converted) – we cannot use built-in SSMA conversions to convert it. Instead we will create new SQL Server identifier node on the fly and set it to the same name as original Oracle’s function, because we know that it works in newer versions of SQL Server:

 

// Create identifier node for the aggregate function in the target over-expression

var targetFunctionIdentifierNode = targetOverExpressionNode.Children.AddNew(SqlServerParserConstants.NodeNames.SIMPLE_IDENTIFIER);

targetFunctionIdentifierNode.Attributes[SqlServerParserConstants.AttributeNames.TEXT] =

    targetFunctionIdentifierNode.Attributes[SqlServerParserConstants.AttributeNames.VALUE] = "FIRST_VALUE";

 

After we have created the target identifier node, we can rely on existing converters to convert its arguments. To invoke existing built-in converters, all we need to do is call base implementation of the ConvertNode method by passing it source node to convert and target parent node, to which all the children should be attached. In our case we just create a new SIMPLE_IDENTIFIER_PARAMS node that will host all arguments and the source node will be the child EXPRESSION_LIST node from the source function identifier:

 

// Create parameters node under target function identifier node

var targetFunctionIdentifierParamsNode =

    targetFunctionIdentifierNode.Children.AddNew(SqlServerParserConstants.NodeNames.SIMPLE_IDENTIFIER_PARAMS);

 

// Convert function arguments

base.ConvertNode(

    sourceFunctionIdentifierNode.GetAllNodesByName(SqlServerParserConstants.NodeNames.EXPRESSION_LIST).First(),

    targetFunctionIdentifierParamsNode, context);

 

Second child is an “over-expression-params” node. This is even simpler – we just create new OVER_EXPRESSION_PARAMS node at the target and convert our source params node, by calling base ConvertNode method again:

 

// Get the parameters for over expression

var sourceOverExpressionParametersNode = sourceNode.Children[1];

   

// Create parameters node in target over-expression

var overExpressionParams = targetOverExpressionNode.Children.AddNew(SqlServerParserConstants.NodeNames.OVER_EXPRESSION_PARAMS);

   

// Convert first child of source over-expression-parameters

base.ConvertNode(

    sourceOverExpressionParametersNode.FirstChild,

    overExpressionParams,

    context);

 

Lastly, we need to return our new converted node:

 

return targetOverExpressionNode;

 

This is it for the successful conversion, but remember – we only convert one function (FIRST_VALUE) and SSMA already supports some other conversions for the OVER expression. To make sure we fallback to original conversion logic for all other functions we need to call our original “over-expression” Node Converter:

 

// Fallback to original over-expression node converter, in case we don't support the input

return originalOverExpressionConverter.ConvertNode(sourceNode, targetParent, context);

 

Congratulations! You’ve just built your first Node Converter! Reasonable question now would be how to use it within SSMA? This is super easy! All you have to do is build your project (it should build successfully with no issues, if you followed all the steps) and copy resulting DLL into the SSMA folder (the one in Program Files, where we got our references from). To verify that SSMA actually found our converter you can navigate to Tools -> Global Settings -> Logging and Converter’s Message level to "Debug". With the debug message level, when you create or open a project you should see something similar to this in the log file:

 

[Converter: Debug] [21316/10] [2017-01-20 16:46:11]: Registering NodeConverter for over-expression: SsmaForOracleConversionSample.CustomOverExpressionConverter

 

If you aren't familiar with the SSMA log files – by default they are stored in %APPDATA%\Microsoft SQL Server Migration Assistant\. Logs are partitioned by the source platform, so that you can distinguish them in case you have multiple instances of SSMA installed. Just pick the latest log file and scan through it looking for the class name of your custom converter. Once you know that your converter was successfully loaded – you can attach Visual Studio debugger to the SSMA and set a break point in your conversion logic to step through and verify that it does what was is intended to.

 

In the beginning, I mentioned that we added this extensibility support to all of our sources. Even though this article does not provide a detailed guidance on how to leverage it for platforms other than Oracle, it should be pretty straightforward: all you will need to do is replace oracle references with Access/MySQL/Sybase/DB2 and use source-specific classes that come with their binaries. Most of the classes have their direct counterparts in other sources, so finding them should not be an issue.

 

This is just a first step in our journey to make SSMA a strong extendable platform for schema conversions. Of course, this doesn’t allow you to override the behavior from start (parsing) to finish (formatting).  We believe this is a good starting point to dive in to the schema conversions that unlocks a lot of potential to our end-users. We are looking forward to your feedback!

 

The complete sample project is attached to this post for download.

 

Sincerely,

The SSMA Team

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.