SQLSweet16!, Episode 10: “I can eat glass …”, but can I load it into a database?

This post has been republished via RSS; it originally appeared at: DataCAT articles.

First published on MSDN on Oct 19, 2016

Authored by Sanjay Mishra


Sanjay Mishra

Reviewed By: Dimitri Furman, Murshed Zaman, Kun Cheng



If you have tried to use BULK INSERT or bcp utilities to load UTF-8 data into a table in SQL Server 2014 or in an earlier release (SQL Server 2008 or later), you have likely received the following error message:
Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.
The requirement to support UTF-8 data for these utilities has been extensively discussed on various forums, most notably on Connect .

This requirement has been addressed in SQL Server 2016 (and backported to SQL Server 2014 SP2). To test this, I obtained a UTF-8 dataset from http://www.columbia.edu/~fdc/utf8/ . The dataset is translation of the sentence “ I can eat glass and it doesn’t hurt me ” in several languages. A few lines of sample data are shown here:



(As an aside, it is entirely possible to load Unicode text such as above into SQL Server even without this improvement, as long as the source text file uses a Unicode encoding other than UTF-8.)
-- SQL Server 2014 SP1 or earlier

CREATE DATABASE DemoUTF8_2014
GO

USE DemoUTF8_2014
GO

CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO

BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO

Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.

-- SQL Server 2016 RTM or SQL Server 2014 SP2 or later

CREATE DATABASE DemoUTF8_2016
GO

USE DemoUTF8_2016
GO

CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO

BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO

(150 row(s) affected)
SELECT * FROM Newdata
GO


You can now use CODEPAGE='65001' with BULK INSERT, bcp and OPENROWSET utilities.

Note that this improvement is only scoped to input processing by bulk load utilities. Internally, SQL Server still uses the UCS-2 encoding when storing Unicode strings.


Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.