This post has been republished via RSS; it originally appeared at: SQL Server Integration Services (SSIS) articles.
First published on MSDN on Aug 05, 2013In the previous blog article , I described how to use the free SSIS DQS Domain Value Import from OH22 data to automate importing values into a DQS Knowledge Base domain. In this article, I will go over the optional configuration to import invalid/error value types as well as to set synonyms.
First, let's quickly recap the domain type and synonym.
Domain type
In DQS, you can define a domain value with different types:
- Correct : This is a valid value that belongs to the domain. For example, in the color name domain, red is a valid color name.
- Error : This is a value that is incorrect. For example, 'tbd' is not a correct color name.
- Invalid : This is a value that may be correct but does not belong to the domain. For example: oliver is a correct person name but it is not a correct color name.
Note : The difference between Error and Invalid is often subtle. Whether you set a domain value as Error or Invalid , DQS cleansing gives the same result and marks records containing those values as Invalid . Many DQS implementation combine the Error and Invalid and only use Correct or Invalid type to simplify.
Synonym
When setting up domain value type to Error or Invalid , you can specify Correct To value. DQS cleansing then automatically replace the value with the Correct To (i.e. synonym).
The SSIS DQS Domain Value Import can automate importing values with different types and setting up synonym above. To illustrate, I use the Knowledge Base from the previous article :
Importing domain values with different types
Consider that we would like to insert new values to the domain Name from a SQL table:
CREATE TABLE DQSCOLORS2 (
NAME NVARCHAR(64),
DomainType INT
)
-- Note: that the column containing domain value must have NVARCHAR/NCHAR and Domain type must be set to INT data type.
INSERT INTO DQSCOLORS2
VALUES
('white', 0)
,('tbd', 1)
,('oliver',2)
The domain type column must be set to one of the following integer value:
- 0 (correct)
- 1 (error)
- 2 (invalid)
In this example, 'white' is a correct color name, 'tbd' is an error, and 'oliver' is an invalid color name.
Configure the SSIS DQS Domain Value Import as follows:
- Specify connection to the DQS server
- Select the Data Quality Knowledge Base from the drop down (e.g. Colors ).
- Select the Domain from the drop down (e.g. Name )
- Specify input column for the Leading Value (e.g. Name)
- Specify input column for the type (e.g. DomainType). Note : The drop down only shows column with INT data type. Check the data type if you do not see the column name from the drop down.
- Accept all other default setting (In part 3 of this series, I will explain how to use error handling options)
- Click OK to finish
Make sure the DQS Knowledge Base Colors is published before executing the SSIS package. After the SSIS package completed, open the Knowledge Base to review the result:
The domain values are added with the specified type.
A couple of things to consider when using the Domain Type option:
- Domain type source column must be set to INT and contained only 0,1,2 and does not have NULL -- any violation to this will result in error during package execution. See the next article for more information on error handling.
- You can't update an existing domain type (e.g. you want to change green from Correct to I nvalid ). This must be done through DQS Client.
Importing domain values with synonym
In this example, I set up a new table with the following:
CREATE TABLE DQSCOLORS3 (
NAME NVARCHAR(64),
Synonym NVARCHAR(64)
)
INSERT INTO DQSCOLORS3 VALUES
('white', 'bright white')
,('DQS_NULL', 'not specified')
,('clear', 'transparent')
The first row sets synonym to an existing valid value (e.g. 'white') in the domain. The second row sets synonym to a special domain value called DQS_NULL (this value can be used to specify how to handle NULL during DQS cleansing). The last row set synonym where both the leading value and its synonym are not currently in the domain.
Configure the SSIS DQS Domain Value Import as follows:
Make sure the DQS Knowledge Base Colors is published, then execute the SSIS package and review the result:
The synonyms are successfully added to the DQS domain.
'bright white' and 'not specified' are set as Correct to 'white' and DQS_NULL accordingly. This means, for example, that when you are cleansing records containing 'not specified', it will be corrected to null. Lastly, both 'clear' and 'transparent' are added with 'clear' set as the leading value. Note that when importing leading value and synonyms using SSIS DQS Domain Value Import, the synonyms are always set to invalid type (e.g. 'transparent') while the leading value are always set to correct (e.g. 'clear').
A few things to consider when using the Synonym option:
- Synonyms are created as new value with domain type always set to Invalid
- You can set synonym to an existing Correct domain values (e.g. 'white" and 'DQS_NULL') -- If you try to set synonym to an Invalid / Error domain, you will get an error during package execution. See the next article for more information on error handling.
- You can set synonym to BOTH a new domain value and a new synonym pair (e.g. 'clear' and 'transparent' are new and not previously exists in the Color Name domain)
Importing domain values with both different types and synonym
In this last example, we will import data that include both type and synonym:
CREATE TABLE DQSCOLORS4 (
NAME NVARCHAR(64),
DomainType INT,
Synonym NVARCHAR(64)
)
INSERT INTO DQSCOLORS4
VALUES
('red',NULL ,'dark red')
,('blue',0 ,'dark blue')
,('blue',1 ,'light blue')
,('blue',3 ,'bright blue')
,('yellow',0 , NULL)
DQS Domain Value import is configured to map all three input columns:
The result of the SSIS package execution shows the following:
A few things to consider when using both Domain Type and Synonym option:
- Domain type is ignored when you have synonym. The new synonym always have Invalid type. (e.g. see 'dark red', 'dark blue', 'light blue' examples above)
- When the synonym is NULL, then the Domain Type must have non NULL valid of either 0,1,2. In another word, you can't have both domain type and synonym to be NULL. This will result in an error during package execution.
Let's proceed to the next article where I describe the options you can use to handle error during import.