代码之家 › 专栏 › 技术社区 › iraciv94

值错误:没有足够的值来解包

sentiment-analysis nltk nlp dictionary python-3.x

iraciv94 · 技术社区 · 6 年前

我试图学习(在Python3上)如何对NLP进行情绪分析,并使用Kaggle上提供的“UMICH SI650-情绪分类”数据库: https://www.kaggle.com/c/si650winter11

目前,我正在尝试生成一个包含一些循环的词汇表,下面是代码:

    import collections
    import nltk
    import os

    Directory = "../Databases"


    # Read training data and generate vocabulary
    max_length = 0
    freqs = collections.Counter()
    num_recs = 0
    training = open(os.path.join(Directory, "train_sentiment.txt"), 'rb')
    for line in training:
        if not line:
            continue
        label, sentence = line.strip().split("\t".encode())
        words = nltk.word_tokenize(sentence.decode("utf-8", "ignore").lower())
        if len(words) > max_length:
            max_length = len(words)
        for word in words:
            freqs[word] += 1
        num_recs += 1
    training.close()

我一直有这个错误,我不完全明白:

值错误:没有足够的值来解包(应为2,得到1)

我试着加上

if not line:
        continue

就像这里建议的那样: ValueError : not enough values to unpack. why? 但这对我的案子不起作用。如何解决此错误?

提前多谢了,

3 回复 | 直到 6 年前

alvas 6 年前

下面是从 https://www.kaggle.com/c/si650winter11

首先 ,上下文管理器是您的朋友,使用它, http://book.pythontips.com/en/latest/context_managers.html

其次 open(filename, 'r') 不 open(filename, 'rb') ,那么就不需要处理str/byte和编码/解码。

现在 :

from nltk import word_tokenize
from collections import Counter
word_counts = Counter()
with open('training.txt', 'r') as fin:
    for line in fin:
        label, text = line.strip().split('\t')
        # Avoid lowercasing before tokenization.
        # lowercasing after tokenization is much better
        # just in case the tokenizer uses captialization as cues. 
        word_counts.update(map(str.lower, word_tokenize(text)))

print(word_counts)

PMende 6 年前

try/except 封锁。类似于:

try:
    label, sentence = line.strip().split("\t".encode())
except ValueError:
    print(f'Error line: {line}')
    continue

我猜你的一些行后面有一个只带空白的标签。

robx 6 年前

您应该检查字段数目错误的情况:

 if not line:
     continue
 fields = line.strip().split("\t".encode())
 if len(fields) != 2:
     # you could print(fields) here to help debug
     continue
 label, sentence = fields