代码之家  ›  专栏  ›  技术社区  ›  Jake Bullet

字符串分析VARCHAR列

  •  1
  • Jake Bullet  · 技术社区  · 7 年前

    我正在使用一个数据库,在这个数据库中,用户与客户端进行的电话通话的详细信息被保存到单个VARCHAR(MAX)列中。然而,出于详细报告目的(SSR),嵌入在VARCHAR列中的所有各种属性都需要转换为单独的列。为了完成这个解析,我在一个存储过程中使用了一个游标,该存储过程调用一个UDF来处理所需子字符串的提取。

    当查询相对较少的行(不超过几百行)时,查询的性能相当不错。然而,当查询大量行(数千行)时,性能很差,查询常常会超时。因此,我正在寻找一种比我现有的方法扩展得更好的方法。

    <table>
    <tr>
    <td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
    <tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
    <tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
    <tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
    <tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
    <tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
    <tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
    <tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
    <tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
    <tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
    <tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
    </table>
    

    目前,我有一个存储过程,它使用游标迭代“live”数据库表中的行。过程光标如下所示:

    OPEN MainNoteCursor;
    FETCH NEXT FROM MainNoteCursor INTO @SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,@DateCol,@NoteText
    
    WHILE (@@FETCH_STATUS <> -1)
        BEGIN
            IF (@@FETCH_STATUS <> -2)
                BEGIN TRY
                    DECLARE @NoteDate date = null
                    DECLARE @DialResult varchar(255) = null
                    DECLARE @ClientName varchar(255) = null
                    DECLARE @NumberDialed varchar(255) = null
                    DECLARE @DialCount int = null
                    DECLARE @ContactMade varchar(3)  = null
                    DECLARE @CampaignCalledOn varchar(255)  = null
                    DECLARE @CallOutcome varchar(255) = null
                    DECLARE @EmailSent varchar(3) = null
                    DECLARE @DoNotCallRequested varchar(3) = null
                    DECLARE @ProductPurchased varchar(3) = null
                    DECLARE @OrderNumber varchar(255) = null
                    DECLARE @PurchaseAmount money = null
                    DECLARE @PurchasedSKUs varchar(255) = null
    
                    SET @NoteDate = CONVERT(date, @DateCol)
    
                    SET @DialResult = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DialResult" class="value">'',''</td>'',1)
                    SET @ClientName = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ClientName" class="value">'',''</td>'',1)
                    SET @NumberDialed = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="NumberDialed" class="value">'',''</td>'',1)
                    SET @DialCount = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DialCount" class="value">'',''</td>'',1)
                    SET @ContactMade = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ContactMade" class="value">'',''</td>'',1)
                    SET @CampaignCalledOn = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="CampaignCalledOn" class="value">'',''</td>'',1)
                    SET @CallOutcome = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="CallOutcome" class="value">'',''</td>'',1)
                    SET @EmailSent = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="EmailSent" class="value">'',''</td>'',1)
                    SET @DoNotCallRequested = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DoNotCallRequested" class="value">'',''</td>'',1)
                    SET @ProductPurchased = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ProductPurchased" class="value">'',''</td>'',1)
                    SET @OrderNumber = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="OrderNumber" class="value">'',''</td>'',1)
                    SET @PurchaseAmount = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="PurchaseAmount" class="value">'',''</td>'',1)
                    SET @PurchasedSKUs = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="PurchasedSKUs" class="value">'',''</td>'',1) 
    
                    INSERT INTO @Return
                        ([SequenceNumber],[ClientId],[ContactNumber],[UserDisplayName],[CreatorId],
                        [DateCol],[NoteDate],[NoteText],[DialResult],[ClientName],[NumberDialed],[DialCount],
                        [ContactMade],[CampaignCalledOn],[CallOutcome],[EmailSent],[DoNotCallRequested],
                        [ProductPurchased],[OrderNumber],[PurchaseAmount],[PurchasedSKUs])
                    VALUES
                        (@SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,
                        @DateCol,@NoteDate,@NoteText,@DialResult,@ClientName,@NumberDialed,@DialCount,
                        @ContactMade,@CampaignCalledOn,@CallOutcome,@EmailSent,@DoNotCallRequested,
                        @ProductPurchased,@OrderNumber,@PurchaseAmount,@PurchasedSKUs)  
                END TRY             
    
                BEGIN CATCH
                    --Nothing to do.
                END CATCH       
            FETCH NEXT FROM MainNoteCursor INTO @SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,@DateCol,@NoteText
        END
    
    CLOSE MainNoteCursor;
    DEALLOCATE MainNoteCursor;
    
    SELECT * FROM @Return
    

    在SSMS中执行时,我得到如下结果:

    enter image description here

    FN\u PARSE\u文本中的代码使用CHARINDEX()、LEN()、SUBSTRING()等来读取“开始字符串”和“结束字符串”之间的文本。我不会发布完整的功能,因为有很多内务管理代码与此无关,但可以归结为:

    --********************************************************************************
    -- CHARINDEX returns the position of the fist character in @StartKey, but we need to
    -- start reading after the *last* character in @StartKey.  Re-adjust the position
    -- at which we''ll start reading the string.
    --********************************************************************************
    
    SET @StartKeyIndex = @StartKeyIndex + @StartKeyLength
    SET @ReadLength = @EndKeyIndex - @StartKeyIndex
    
    --********************************************************************************
    -- Start / End positions have been determined, so now read out the calculated number
    -- of characters and return to the calling code.
    --********************************************************************************
    
    SET @ReturnValue = LTRIM(RTRIM(SUBSTRING(@noteText, @StartKeyIndex, @ReadLength)))
    

    我曾想过创建一个过夜的批处理过程,在下班时间进行所有解析,然后将数据转储到一个扁平的表中。然后,我会报告平坦的数据,而不是试图实时解析细节。当然,这打开了它自己的蠕虫罐,我希望继续访问实时数据。根据用户在运行查询时提供的日期范围,可能会返回超过10000行。

    我读过文章( https://sqlperformance.com/2012/07/t-sql-queries/split-strings -例如)这谈论了一些像CTE的事情,但我在那里没有太多的经验,所以我不确定如何实现这一飞跃。我读过的大多数“字符串解析”文章都是针对定界符相同的场景的,比如解析逗号分隔的字符串。但由于我有“滚动”分隔符,我不知道该如何处理。

    无论如何,如果你还和我在一起,如果你有任何意见,我将不胜感激。

    3 回复  |  直到 7 年前
        1
  •  2
  •   Xedni    7 年前

    为了提供一个具体的示例,说明如何使用XML实现这一点,下面是我的做法。我这里只有一个示例文档,但如果您有多个,同样适用。您只需为每个主键返回多行。

    ;with src (Id, NoteText) as
    (
        select 1, cast('<table>
    <tr>
    <td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
    <tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
    <tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
    <tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
    <tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
    <tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
    <tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
    <tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
    <tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
    <tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
    <tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
    </table>' as xml)
    )
    
    select 
        a.*,
        t.c.value('td[1]', 'varchar(max)')
    from src a
    cross apply a.NoteText.nodes('table/tr') as t(c)
    

    SQL中的XML分解文档可能有点命中或未命中,主要是因为SQL正在实现独立于SQL和Microsoft的xquery。这就是说,对于做基本的事情,通常MSDN文档可以让您很好地开始。

        2
  •  1
  •   Tab Alleman    7 年前

    如果你能修改数据库,我会尽我所能 . 尤其是如果这是一个用户经常为CRM应用程序之类的应用程序执行的查询。

    我要么将所有这些经常选择的列添加到我的表中,要么为它们创建一个单独的表,该表可以链接到此表的主键。

    然后,理想情况下,我会在插入时填充这些列。最好让INSERT通过存储过程,但如有必要,可以在表上放置INSERT触发器。

    如果我不能在插入时执行,那么我会运行一个作业,根据实际需要/允许的频率进行解析并填充列。

    对于大多数CRM应用程序,选择性能比插入性能更为关键,因为它会影响多行,而插入通常一次只发生一行。因此,如果可能的话,我会尽量把性能负担转移到那里。

    免责声明:我从未使用过Xedni在其评论中引用的XML函数。所以我说不出我的建议和他的建议在性能上有什么不同。

        3
  •  0
  •   Maurício Pontalti Neri    7 年前

    可以使用XML转换值,并在透视结果后进行转换

    DECLARE @xml xml = '<table>
    <tr>
    <td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
    <tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
    <tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
    <tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
    <tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
    <tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
    <tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
    <tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
    <tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
    <tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
    <tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
    </table>'
    
    
    Select 
     Tbl.Col.value('@id','varchar(max)') Id
    ,Tbl.Col.value('.','varchar(max)') Value
    FRom
     @xml.nodes('/table/tr/td[@class="value"]') Tbl(Col)  
    
    
    SELECT 
     DialResult
    ,ClientName
    ,NumberDialed
    ,DialCount
    ,ContactMade
    ,CampaignCalledOn
    ,CallOutcome
    ,EmailSent
    ,DoNotCallRequested
    ,ProductPurchased
    ,OrderNumber
    ,PurchaseAmount
    ,PurchasedSKUs
    FROM
    (
    Select 
     Tbl.Col.value('@id','varchar(max)') Id
    ,Tbl.Col.value('.','varchar(max)') Value
    FRom
     @xml.nodes('/table/tr/td[@class="value"]') Tbl(Col) ) T Pivot 
     (Max(Value) FOR  Id In(DialResult
    ,ClientName
    ,NumberDialed
    ,DialCount
    ,ContactMade
    ,CampaignCalledOn
    ,CallOutcome
    ,EmailSent
    ,DoNotCallRequested
    ,ProductPurchased
    ,OrderNumber
    ,PurchaseAmount
    ,PurchasedSKUs)) p;
    

    Id                                                 Value
    -------------------------------------------------- --------------------------------------------------
    DialResult                                         No Answer
    ClientName                                         SMITH, BOB
    NumberDialed                                       5555555555 [Day]
    DialCount                                          1
    ContactMade                                        No
    CampaignCalledOn                                   TEST CAMPAIGN
    CallOutcome                                        No answer
    EmailSent                                          No
    DoNotCallRequested                                 No
    ProductPurchased                                   No
    OrderNumber                                        N/A
    PurchaseAmount                                     0.00
    PurchasedSKUs                                      N/A
    

    DialResult                                         ClientName                                         NumberDialed                                       DialCount                                          ContactMade                                        CampaignCalledOn                                   CallOutcome                                        EmailSent                                          DoNotCallRequested                                 ProductPurchased                                   OrderNumber                                        PurchaseAmount                                     PurchasedSKUs
    -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
    No Answer                                          SMITH, BOB                                         5555555555 [Day]                                   1                                                  No                                                 TEST CAMPAIGN                                      No answer                                          No                                                 No                                                 No                                                 N/A                                                0.00                                               N/A
    
    (1 row(s) affected)