代码之家  ›  专栏  ›  技术社区  ›  Roman Liuts

ValueError:找到样本数不一致的数组

  •  2
  • Roman Liuts  · 技术社区  · 9 年前

    这是我的代码:

    import pandas as pa
    from sklearn.linear_model import Perceptron
    from sklearn.metrics import accuracy_score
    
    def get_accuracy(X_train, y_train, y_test):
        perceptron = Perceptron(random_state=241)
        perceptron.fit(X_train, y_train)
        result = accuracy_score(y_train, y_test)
        return result
    
    test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
    test_data.columns = ["class", "f1", "f2"]
    train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
    train_data.columns = ["class", "f1", "f2"]
    
    accuracy = get_accuracy(train_data[train_data.columns[1:]], train_data[train_data.columns[0]], test_data[test_data.columns[0]])
    print(accuracy)
    

    我不明白为什么会出现此错误:

    Traceback (most recent call last):
      File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 35, in <module>
        accuracy = get_accuracy(train_data[train_data.columns[1:]], 
    train_data[train_data.columns[0]], test_data[test_data.columns[0]])
      File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 22, in get_accuracy
        result = accuracy_score(y_train, y_test)
      File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score
        y_type, y_true, y_pred = _check_targets(y_true, y_pred)
      File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets
        check_consistent_length(y_true, y_pred)
      File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
        "%s" % str(uniques))
    ValueError: Found arrays with inconsistent numbers of samples: [199 299]
    

    我想通过方法accuracy_score通过获得这种类型的错误来获得准确性。我在谷歌上找不到任何能帮助我的东西。谁能解释我发生了什么?

    1 回复  |  直到 7 年前
        1
  •  1
  •   Arya McCarthy Pankaj Pathak    7 年前

    sklearn.metrics.accuracy_score() y_true y_pred 论据。也就是说,对于相同的数据集(可能是测试集),它想要知道地面真相和模型预测的值。这将允许它评估与假设的完美模型相比,您的模型表现得有多好。

    在您的代码中,您将传递两个不同数据集的真实结果变量。这些结果都是真实的,决不能反映您的模型正确分类观察结果的能力!

    正在更新您的 get_accuracy() 也要执行的函数 X_test 作为一个参数,我认为这更符合您的意图:

    def get_accuracy(X_train, y_train, X_test, y_test):
        perceptron = Perceptron(random_state=241)
        perceptron.fit(X_train, y_train)
        pred_test = perceptron.predict(X_test)
        result = accuracy_score(y_test, pred_test)
        return result