代码之家  ›  专栏  ›  技术社区  ›  iraciv94

值错误:没有足够的值来解包

  •  0
  • iraciv94  · 技术社区  · 6 年前

    我试图学习(在Python3上)如何对NLP进行情绪分析,并使用Kaggle上提供的“UMICH SI650-情绪分类”数据库: https://www.kaggle.com/c/si650winter11

    目前,我正在尝试生成一个包含一些循环的词汇表,下面是代码:

        import collections
        import nltk
        import os
    
        Directory = "../Databases"
    
    
        # Read training data and generate vocabulary
        max_length = 0
        freqs = collections.Counter()
        num_recs = 0
        training = open(os.path.join(Directory, "train_sentiment.txt"), 'rb')
        for line in training:
            if not line:
                continue
            label, sentence = line.strip().split("\t".encode())
            words = nltk.word_tokenize(sentence.decode("utf-8", "ignore").lower())
            if len(words) > max_length:
                max_length = len(words)
            for word in words:
                freqs[word] += 1
            num_recs += 1
        training.close()
    

    我一直有这个错误,我不完全明白:

    值错误:没有足够的值来解包(应为2,得到1)

    我试着加上

    if not line:
            continue
    

    就像这里建议的那样: ValueError : not enough values to unpack. why? 但这对我的案子不起作用。如何解决此错误?

    提前多谢了,

    3 回复  |  直到 6 年前
        1
  •  1
  •   alvas    6 年前

    下面是从 https://www.kaggle.com/c/si650winter11

    首先 ,上下文管理器是您的朋友,使用它, http://book.pythontips.com/en/latest/context_managers.html

    其次 open(filename, 'r') open(filename, 'rb') ,那么就不需要处理str/byte和编码/解码。

    现在 :

    from nltk import word_tokenize
    from collections import Counter
    word_counts = Counter()
    with open('training.txt', 'r') as fin:
        for line in fin:
            label, text = line.strip().split('\t')
            # Avoid lowercasing before tokenization.
            # lowercasing after tokenization is much better
            # just in case the tokenizer uses captialization as cues. 
            word_counts.update(map(str.lower, word_tokenize(text)))
    
    print(word_counts)
    
        2
  •  1
  •   PMende    6 年前

    try/except 封锁。类似于:

    try:
        label, sentence = line.strip().split("\t".encode())
    except ValueError:
        print(f'Error line: {line}')
        continue
    

    我猜你的一些行后面有一个只带空白的标签。

        3
  •  0
  •   robx    6 年前

    您应该检查字段数目错误的情况:

     if not line:
         continue
     fields = line.strip().split("\t".encode())
     if len(fields) != 2:
         # you could print(fields) here to help debug
         continue
     label, sentence = fields