代码之家  ›  专栏  ›  技术社区  ›  Olivia Brown

设计朴素贝叶斯分类器时的AttributeError

  •  1
  • Olivia Brown  · 技术社区  · 7 年前

    我试图创建一个简单的 Naive Bayes Classifier 用于在以下代码中提到的两个类之间对数据进行分类。但是我被下面的错误困扰着,谁能告诉我我做错了什么。

    Traceback (most recent call last):
      File "NBC.py", line 33, in <module>
        test(['Apple', 'Banana'])
      File "NBC.py", line 16, in test
        prob_dist = classifier.prob_classify(lst)
      File "/home/***/.local/lib/python3.6/site-packages/nltk/classify/naivebayes.py", line 95, in prob_classify
        for fname in list(featureset.keys()):
    AttributeError: 'list' object has no attribute 'keys'
    

    “NBC.py”

    from nltk.classify import NaiveBayesClassifier
    
    dataFruits = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 
                  'Lemon', 'Mangos', 'Orange', 'Strawberry', 'Watermelon']
    
    dataVeggies = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 
                   'Barccoli', 'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
    
    def create_features(word):
        my_dict = dict([(word, True)])
        return my_dict
    
    def test(words):
        lst = [create_features(wd) for wd in words]
    
        prob_dist = classifier.prob_classify(lst)
        print(prob_dist.prob('fruit'))
    
    class1= [(create_features(item), 'fruit') for item in dataFruits]
    #print(class1)
    
    class2 = [(create_features(item), 'veggie') for item in dataVeggies]
    #print(class2)
    
    train_set = class1[:] + class2
    print(train_set)
    
    # Train
    classifier = NaiveBayesClassifier.train(train_set)
    
    
    # Predict
    test(['Apple', 'Banana'])
    
    1 回复  |  直到 7 年前
        1
  •  1
  •   user2314737    7 年前

    您的代码试图做的是基于名称特性构建一个非常简单的分类器。根据其名称,项目将被分类为 'fruit' 或作为 'veggie' . 训练集包含几个名称及其各自的类。

    您遇到的错误是由于训练集和测试集的格式错误造成的。培训集是 功能集 (每个培训示例一个功能集),并应具有以下形式的结构:

    training_set = [featureset1, featureset2, ...]
    

    每个功能集都是 一对 (features, class) 哪里 features 是一本字典

    {'f1': value1, 'f2': value2, ...}
    

    class 是有价值的。例如,在分类器中 'Apple' 是:

    ({'Apple': True,
      'Banana': False,
      'Broccoli': False,
      'Cabbage': False,
      'Carrot': False,
      'Cherry': False,
      'Cucumber': False,
      'Eggplant': False,
      'Grape': False,
      'Guava': False,
      'Lemon': False,
      'Mangos': False,
      'Onion': False,
      'Orange': False,
      'Pea': False,
      'Potato': False,
      'Spinach': False,
      'Strawberry': False,
      'Tomato': False,
      'Watermelon': False},
     'fruit')
    

    以下是更正的代码:

    from nltk.classify import NaiveBayesClassifier, accuracy
    
    dataFruits = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 
                  'Lemon', 'Mangos', 'Orange', 'Strawberry', 'Watermelon']
    
    dataVeggies = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 
                   'Broccoli', 'Tomato', 'Pea', 'Cucumber', 'Eggplant']
    
    def create_features(word, featureNames):
        my_dict = dict([(w, False) for w in featureNames])    
        my_dict[word] = True
        return my_dict
    
    def test(word):
        lst = create_features(word, allFeatures)
        prob_dist = classifier.prob_classify(lst)
        print('{}'.format(word))
        print('Fruit probability: {:.2f}\tVeggie probability: {:.2f}'.format( prob_dist.prob('fruit'), prob_dist.prob('veggie')))
        return prob_dist
    
    allFeatures = dataFruits + dataVeggies
    class1= [(create_features(item, allFeatures), 'fruit') for item in dataFruits]
    
    class2 = [(create_features(item, allFeatures), 'veggie') for item in dataVeggies]
    
    train_set = class1[:] + class2
    test_set = [(create_features(item, allFeatures), 'fruit') for item in ['Apple','Banana']]
    
    # Train
    classifier = NaiveBayesClassifier.train(train_set)
    
    
    # Predict
    test('Strawberry')
    test('Strawby')
    
    # Accuracy on test set
    print('Accuracy on test set: {:.2f}'.format(accuracy(classifier, test_set)))  
    

    一个稍微好一点的分类器,也许这就是您所想的(按照 http://www.nltk.org/book/ch06.html (文件分类)。在这里,分类器只是预测篮子中是否包含更多的水果或蔬菜。基于此,您可以构造更复杂的分类器(具有更好的特征和更多的训练数据)。

    from nltk.classify import NaiveBayesClassifier, accuracy
    
    dataFruits = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 
                  'Lemon', 'Mangos', 'Orange', 'Strawberry', 'Watermelon']
    
    dataVeggies = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 
                   'Broccoli', 'Tomato', 'Pea', 'Cucumber', 'Eggplant']
    
    
    def basket_features(basket): 
        basket_items = set(basket) 
        features = {}
        for item in allFeatures:
            features['contains({})'.format(item)] = (item in basket_items)
        return features
    
    def test(basket):
        lst = basket_features(basket)
        prob_dist = classifier.prob_classify(lst)
        print('Basket: {}'.format(basket))
        print('Fruit probability: {:.2f}\tVeggie probability: {:.2f}'.format(prob_dist.prob('fruit'), prob_dist.prob('veggie')))
        return prob_dist
    
    allFeatures = dataFruits + dataVeggies
    class1= [(basket_features([item]), 'fruit') for item in dataFruits]
    
    class2 = [(basket_features([item]), 'veggie') for item in dataVeggies]
    
    train_set = class1[:] + class2
    
    # Train
    classifier = NaiveBayesClassifier.train(train_set)
    
    
    # Predict
    test(['Apple', 'Banana', 'Cherry', 'Carrot', 'Eggplant', 'Cabbage','Pea'])
    test(['Apple', 'Banana',  'Mangos', 'Carrot', 'Eggplant', 'Cabbage','Pea', 'Cucumber'])
    test(['Apple', 'Banana'])
    test(['Apple', 'Banana', 'Grape'])
    
    classifier.show_most_informative_features(5)