代码之家  ›  专栏  ›  技术社区  ›  Przemyslaw Remin

将UTF<U+something>转换为nvarchar

  •  0
  • Przemyslaw Remin  · 技术社区  · 6 年前

    我有一个奇怪的UTF字符串存储:

    <U+0410><U+043B><U+044C><U+043A><U+0430>
    

    我怎样才能把它转换回nvarchar?上面的字符串应该转换为 Алька

    +-------------------------------------------------------------------------------------+-----------------+
    |                                   Column1_encoded                                   | Column1_decoded |
    +-------------------------------------------------------------------------------------+-----------------+
    | <U+0410><U+043B><U+044C><U+043A><U+0430>                                            | Алька           |
    | ABC <U+0410><U+043B><U+044C><U+043A><U+0430> 1                                      | ABC Алька 1     |
    | <U+0410><U+043B> 2 <U+044C><U+043A><U+0430>                                         | Ал 2 ька        |
    | <U+0410><U+043B><U+044C><U+043A><U+0430> 3 <U+0410><U+043B><U+044C><U+043A><U+0430> | Алька 3 Алька   |
    +-------------------------------------------------------------------------------------+-----------------+
    

    我在通过R转换将数据从Power BI发送到SQL Server时得到了这种奇怪的格式,正是这样: https://stackoverflow.com/a/51386029/1903793

    杰罗恩·莫斯特在评论中的回答似乎能解决这个问题。谢谢您。

    1 回复  |  直到 6 年前
        1
  •  1
  •   iamdave    6 年前

    为了在多个列值中使用它,您需要将其转换为表值函数并通过 cross apply ,不过我相信你自己能做到。注释中有解释:

    declare @str nvarchar(1000) = '<U+0410><U+043B><U+044C><U+043A><U+0430> This is a string with <U+0410><U+043B><U+044C><U+043A><U+0430> not encoded as we would like <U+0410><U+043B><U+044C><U+043A><U+0430>';
    
    -- Add an additional > character before the first < character to act as the first delimiter
    -- and then insert a delimiting > character before any instances of a < chracter that follow a space to ensure the character code is properly parsed out.
    select @str = replace(stuff(@str,charindex('<',@str,1),0,'>'),' <',' ><');
    
                    -- Start tally table with 10 rows.
    with n(n)   as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
                    -- Select the same number of rows as characters in @str as incremental row numbers.
                    -- Cross joins increase exponentially to a max possible 10,000 rows to cover largest @str length.
        ,t(t)   as (select top (select len(@str) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
                    -- Return the position of every value that follows the specified delimiter.
        ,s(s)   as (select 1 union all select t+1 from t where substring(@str,t,1) = '>')
                    -- Return the start and length of every value, to use in the SUBSTRING function.
                    -- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
        ,l(s,l) as (select s,isnull(nullif(charindex('>',@str,s),0)-s,4000) from s)
        ,r      as (select rn as ItemNumber
                        ,Item
                    from(select row_number() over(order by s) as rn
                                ,substring(@str,s,l) as item
                        from l
                        ) a
                    where Item <> ''
                    )
    select cast((select case when left(Item,3) = '<U+'  -- Where required, convert the Unicode number into a character using the NCHAR function
                            then nchar(convert(nvarchar(500),convert(int,(convert(varbinary(max),replace(Item,'<U+','0x0000'),1)))))
                            else Item
                            end
                from r
                order by ItemNumber
                for xml path('')
                ) as nvarchar(max)) as String;
    

    输出:

    +----------------------------------------------------------------------+
    | String                                                               |
    +----------------------------------------------------------------------+
    | Алька This is a string with Алька not encoded as we would like Алька |
    +----------------------------------------------------------------------+