代码之家  ›  专栏  ›  技术社区  ›  devson

使用lucene.net搜索时如何启用词干分析?

  •  15
  • devson  · 技术社区  · 15 年前

    2 回复  |  直到 15 年前
        1
  •  21
  •   Jack Ryan    13 年前

    为此,您需要编写自己的analyzer类。这是相对简单的。这是我正在使用的一个。它结合了停止字过滤。波特的词干和(这对你的需要来说可能太多了)从字符中去除重音。

    /// <summary>
    /// An analyzer that implements a number of filters. Including porter stemming,
    /// Diacritic stripping, and stop word filtering.
    /// </summary>
    public class CustomAnalyzer : Analyzer
    {
        /// <summary>
        /// A rather short list of stop words that is fine for basic search use.
        /// </summary>
        private static readonly string[] stopWords = new[]
        {
            "0", "1", "2", "3", "4", "5", "6", "7", "8",
            "9", "000", "$", "£",
            "about", "after", "all", "also", "an", "and",
            "another", "any", "are", "as", "at", "be",
            "because", "been", "before", "being", "between",
            "both", "but", "by", "came", "can", "come",
            "could", "did", "do", "does", "each", "else",
            "for", "from", "get", "got", "has", "had",
            "he", "have", "her", "here", "him", "himself",
            "his", "how","if", "in", "into", "is", "it",
            "its", "just", "like", "make", "many", "me",
            "might", "more", "most", "much", "must", "my",
            "never", "now", "of", "on", "only", "or",
            "other", "our", "out", "over", "re", "said",
            "same", "see", "should", "since", "so", "some",
            "still", "such", "take", "than", "that", "the",
            "their", "them", "then", "there", "these",
            "they", "this", "those", "through", "to", "too",
            "under", "up", "use", "very", "want", "was",
            "way", "we", "well", "were", "what", "when",
            "where", "which", "while", "who", "will",
            "with", "would", "you", "your",
            "a", "b", "c", "d", "e", "f", "g", "h", "i",
            "j", "k", "l", "m", "n", "o", "p", "q", "r",
            "s", "t", "u", "v", "w", "x", "y", "z"
        };
    
        private Hashtable stopTable;
    
        /// <summary>
        /// Creates an analyzer with the default stop word list.
        /// </summary>
        public CustomAnalyzer() : this(stopWords) {}
    
        /// <summary>
        /// Creates an analyzer with the passed in stop words list.
        /// </summary>
        public CustomAnalyzer(string[] stopWords)
        {
            stopTable = StopFilter.MakeStopSet(stopWords);       
        }
    
        public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
        {
            return new PorterStemFilter(new ISOLatin1AccentFilter(new StopFilter(new LowerCaseTokenizer(reader), stopWords)));
        }
    }
    
        2
  •  7
  •   Yuval F    15 年前

    Snowball PorterStemFilter . 见 Java Analyzer documentation 作为组合不同过滤器/标记器/分析器的指南。注意,必须使用相同的分析器进行索引和检索,以便在索引时开始处理词干。