代码之家  ›  专栏  ›  技术社区  ›  Ian Boyd

告诉我SQL Server全文搜索器疯了,不是我。

  •  5
  • Ian Boyd  · 技术社区  · 14 年前

    我有一些客户的特定地址是用户正在搜索的:

    123种通用方式

    数据库中有5行匹配:

    ResidentialAddress1
    =============================
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    

    我运行一个ft查询来查找这些行。在向搜索添加更多条件时,我将向您显示每个步骤:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE CONTAINS(Patrons.ResidentialAddress1, '"123*"')
    
    ResidentialAddress1
    =========================
    123 MAPLE STREET
    12345 TEST
    123 MINE STREET
    123 GENERIC WAY
    123 FAKE STREET
    ...
    
    (30 row(s) affected)
    

    好吧,到目前为止还不错,现在加上这个词” 通用的 “:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
    AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
    
    ResidentialAddress1
    =============================
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    123 GENERIC WAY
    
    (5 row(s) affected)
    

    杰出的。现在我要添加用户想要确保存在的最后一个关键字:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
    AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
    AND CONTAINS(Patrons.ResidentialAddress1, '"way*"')
    
    
    ResidentialAddress1            
    ------------------------------ 
    
    (0 row(s) affected)
    

    呵呵?没有行吗?如果我只查询“way*”,该怎么办:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE CONTAINS(Patrons.ResidentialAddress1, '"way*"')
    
    ResidentialAddress1            
    ------------------------------ 
    
    (0 row(s) affected)
    

    一开始我以为可能是因为 * 它要求根 way 后面有更多的字符。但事实并非如此:

    • 搜索“123*”与“123”匹配
    • 搜索“generic*”匹配“generic”
    • 网上图书说, 星号匹配零个、一个或多个字符

    如果我移除 * 仅限S&G:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE CONTAINS(Patrons.ResidentialAddress1, '"way"')
    
    Server: Msg 7619, Level 16, State 1, Line 1
    A clause of the query contained only ignored words. 
    

    所以有人可能认为你是不被允许的 搜索 对于 方式 或作为根。但这也不是真的:

    SELECT * FROM Patrons
    WHERE CONTAINS(Patrons.*, '"way*"')
    
    AccountNumber FirstName Lastname
    ------------- --------- --------
    33589         JOHN      WAYNE                    
    

    总而言之,用户正在搜索包含所有单词的行:

    123种通用方式

    正确地说,我把它翻译成 WHERE 条款:

    SELECT * FROM Patrons
    WHERE CONTAINS(Patrons.*, '"123*"')
    AND CONTAINS(Patrons.*, '"generic*"')
    AND CONTAINS(Patrons.*, '"way*"')
    

    它不返回任何行。告诉我这不起作用,这不是我的错,而且SQL Server很疯狂。

    注: 我清空了FT索引并重新构建了它。

    更新一

    SELECT Lastname, ResidentialAddress1 FROM Patrons
    WHERE CONTAINS(Patrons.*, '"gen*"')
    
    Lastname                  ResidentialAddress1            
    ------------------------- ------------------------------ 
    SAVE                      123 GENERIC WAY
    Genders                   
    SAVE                      123 GENERIC WAY
    Patron                    123 GENERIC WAY
    SAVE                      123 GENERIC WAY
    SAVE                      234 GENERIC WAY
    SAVE                      123 GENERIC WAY
    
    (7 row(s) affected)
    

    更新二

    假装用户输入:

    123通用佤族

    SELECT ResidentialAddress1 FROM Patrons
    WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
    AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
    AND CONTAINS(Patrons.ResidentialAddress1, '"wa*"')
    
    ResidentialAddress1            
    ------------------------------ 
    
    (0 row(s) affected)
    

    真正的问题是,用户输入的内容完全有效,他们希望看到任何人都希望看到的内容。


    更新三

    有人要求这么多,这不是我的错!:

    CREATE TABLE [dbo].[Patrons] (
        [PatronGUID]  uniqueidentifier ROWGUIDCOL  NOT NULL ,
        [AccountNumber] [bigint] NULL ,
        [FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [MiddleInitial] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [Lastname] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [EyeColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [HairColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [Gender] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [Birthday] [datetime] NULL ,
        [Height] [int] NULL ,
        [Weight] [int] NULL ,
        [FacialHair] [tinyint] NULL ,
        [Nationality] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [IdentifyingMarks] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [DriversLicenseNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [DriversLicenseRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [DriversLicenseCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [DriversLicenseExpires] [datetime] NULL ,
        [DriversLicenseDateVerified] [datetime] NULL ,
        [PassportNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PassportRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PassportCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PassportExpires] [datetime] NULL ,
        [PassportDateVerified] [datetime] NULL ,
        [OtherIdentificationNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [OtherIdentificationRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [OtherIdentificationCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [OtherIdentificationType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [OtherIdentificationExpires] [datetime] NULL ,
        [OtherIdentificationDateVerified] [datetime] NULL ,
        [ResidentialAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ResidentialPhoneNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [CountryOfResidence] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [BusinessPhone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PositionWithFirm] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [EmployerTelephone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [MemberCardType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PlayerStatusCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [AccountType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [AccountStatus1] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [AccountStatus2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [IsVIPExchangeRate] [tinyint] NULL ,
        [ChangedUserGUID_Depricated] [uniqueidentifier] NULL ,
        [ChangedUser] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [ChangedDate] [datetime] NULL ,
        [ChangedWorkstation] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [PendingUpdates_Depricated] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
    ) ON [PRIMARY]
    GO
    
    ALTER TABLE [dbo].[Patrons] ADD 
        CONSTRAINT [DF_Patrons_PatronGUID] DEFAULT (newid()) FOR [PatronGUID],
        CONSTRAINT [PK_Patrons] PRIMARY KEY  NONCLUSTERED 
        (
            [PatronGUID]
        ) WITH  FILLFACTOR = 90  ON [PRIMARY] 
    GO
    
    if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1 
    exec sp_fulltext_database N'enable' 
    
    GO
    
    if not exists (select * from dbo.sysfulltextcatalogs where name = N'TheFullTextCatalog')
    exec sp_fulltext_catalog N'TheFullTextCatalog', N'create' 
    
    GO
    
    exec sp_fulltext_table N'[dbo].[Patrons]', N'create', N'TheFullTextCatalog', N'PK_Patrons'
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'FirstName', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'MiddleInitial', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'Lastname', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'EyeColor', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'IdentifyingMarks', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress1', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress2', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress3', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCity', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialZipCode', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialRegion', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCountry', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialPhoneNumber', N'add', 1033  
    GO
    
    exec sp_fulltext_column N'[dbo].[Patrons]', N'CountryOfResidence', N'add', 1033  
    GO
    
    exec sp_fulltext_table N'[dbo].[Patrons]', N'activate'  
    GO
    

    以下是那个不相信我的人的截图:

    应有效但无效的查询: alt text http://i49.tinypic.com/dbo8w9.png

    有效但不有用的查询: alt text http://i49.tinypic.com/30mptm9.png

    对证明内容有效但不有用的查询: alt text http://i49.tinypic.com/2q04nmc.png


    更新四

    查询 不能 被写为

    CONTAINS(Patrons.*, 'words...')
    

    因为有一些项目在逻辑上或物理上没有被FT索引覆盖。例如用户查询 为:

    2010年6月4日伊恩·博伊德619

    显示四个关键字:

    • 6/4/2010
    • 伊恩
    • 博伊德
    • 六百一十九

    这意味着他们希望所有条件都为真,伪代码为:

    WHERE 6/4/2010 is in the row
    AND ian is in the row
    AND boyd is in the row
    AND 619 is in the row
    

    将其转换为以下部分查询:

    WHERE --Keyword 1: 6/4/2010
    (
       ((ChangedDate >= '20100604') AND (ChangedDate < '20100605'))
       OR 
       ((LastTransactionDate >= '20100604') AND (LastTransactionDate < '20100605'))
       OR 
       (CONTAINS(Patrons.*, '"6/4/2010*"')
    )
    AND --Keyword 2: ian
    (
        CONTAINS(Patrons.*, '"ian*"')
    )
    AND --Keyword 3: boyd
    (
        CONTAINS(Patrons.*, '"boyd*"')
    )
    AND --Keyword 4: 619
    (
        (AccountNumber IN (SELECT CAST(619 AS bigint)))
        OR
        (CONTAINS(Patrons.*, '"619*"'))
    )
    

    其中一个回答者正在看原始问题中给出的简化示例,而不是现实世界。说它是 不正确的 有多重的 AND 条款是无限制的。

    5 回复  |  直到 14 年前
        1
  •  6
  •   Steven Sudit    14 年前

    消息告诉您“way”是一个停止字,这意味着它被忽略并且没有索引。这就是为什么你能找到“韦恩”而不是“韦恩”。

    所以,不,这不疯狂,你也不疯狂。只是一个简单的误会。

        2
  •  4
  •   Aaronaught    14 年前

    创建FT索引时,可能使用了系统非索引字表。单词 way 正好在里面。您可以通过这个查询看到它:

    SELECT *
    FROM sys.fulltext_system_stopwords
    WHERE stopword = 'way'
    AND language_id = 1033
    

    您可以关闭非索引字表或创建自定义非索引字表,但更好的解决方案是正确地编写查询;不要使用多个非索引字表 WHERE CONTAINS 从句,把它们组合成一个。否则,SQL Server可能无法有效地使用FT索引。

    您的查询应该如下所示:

    SELECT ResidentialAddress1 FROM Patrons
    WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*" AND "generic*" AND "way*"')
    

    如果您这样做,停止字就会被忽略;它仍然会返回所有相同的结果,如果您没有包含这个术语,它会返回所有相同的结果。 way* .


    编辑:刚注意到你标记了这个 sql-server-2000 ,因此第一个查询可能不起作用。在SQL 2000中,它们是“干扰词”,我相信配置是全局的,您没有单独的停止列表。不过,如果你写一个 其中包含 从句而不是几个。

    要编辑SQL Server 2000中的干扰词,必须在SQL Server中编辑特定于语言的文件。 FTDATA 配置文件夹。更多详细信息如下: SQL Server Full Text Search Noise Words and Thesaurus Configurations .

        3
  •  3
  •   Darknight    14 年前

    解决方案1:

    你想试试 Transform Noise Word 选项 (SQL 2008) .

    关闭此项,应停止删除单词。

    例子:

    sp_configure 'show advanced options', 1
    RECONFIGURE
    GO
    sp_configure 'transform noise words', 1
    RECONFIGURE
    GO
    

    编辑1:

    希望旧版本的MS SQL也有类似的功能?

        4
  •  2
  •   Josh Smeaton    14 年前
        5
  •  0
  •   D'Arcy Rittich    14 年前

    也许它需要三个以上的字母字符。再试试三个字母的单词,比如 gen* .