代码之家  ›  专栏  ›  技术社区  ›  alpha5401

如何解决错误:AttributeError:“generator”对象没有属性“endswith”

  •  1
  • alpha5401  · 技术社区  · 6 年前

    当我试图运行此代码来预处理文本时,我得到以下错误,有人有类似的问题,但帖子没有足够的详细信息。

    我把一切都放在这里,希望能帮助评论家更好地帮助我们。

    这里是函数;

    def preprocessing(text):
        #text=text.decode("utf8")
        #tokenize into words
        tokens=[word for sent in nltk.sent_tokenize(text) for word in 
        nltk.word_tokenize(sent)]
        #remove stopwords
        stop=stopwords.words('english')
        tokens=[token for token in tokens if token not in stop]
        #remove words less than three letters
        tokens=[word for word in tokens if len(word)>=3]
        #lower capitalization
        tokens=[word.lower() for word in tokens]
        #lemmatization
        lmtzr=WordNetLemmatizer()
        tokens=[lmtzr.lemmatize(word for word in tokens)]
        preprocessed_text=' '.join(tokens)
        return preprocessed_text
    

    在此处调用函数;

    #open the text data from disk location
    sms=open('C:/Users/Ray/Documents/BSU/Machine_learning/Natural_language_Processing_Pyhton_And_NLTK_Chap6/smsspamcollection/SMSSpamCollection')
    sms_data=[]
    sms_labels=[]
    csv_reader=csv.reader(sms,delimiter='\t')
    for line in csv_reader:
        #adding the sms_id
        sms_labels.append(line[0])
        #adding the cleaned text by calling the preprocessing method
        sms_data.append(preprocessing(line[1]))
    sms.close()
    

    后果

    --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-38-b42d443adaa6> in <module>()
          8     sms_labels.append(line[0])
          9     #adding the cleaned text by calling the preprocessing method
    ---> 10     sms_data.append(preprocessing(line[1]))
         11 sms.close()
    
    <ipython-input-37-69ef4cd83745> in preprocessing(text)
         12     #lemmatization
         13     lmtzr=WordNetLemmatizer()
    ---> 14     tokens=[lmtzr.lemmatize(word for word in tokens)]
         15     preprocessed_text=' '.join(tokens)
         16     return preprocessed_text
    
    ~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
         38 
         39     def lemmatize(self, word, pos=NOUN):
    ---> 40         lemmas = wordnet._morphy(word, pos)
         41         return min(lemmas, key=len) if lemmas else word
         42 
    
    ~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in
    _morphy(self, form, pos, check_exceptions)    1798     1799         # 1. Apply rules once to the input to get y1, y2, y3, etc.
    -> 1800         forms = apply_rules([form])    1801     1802         # 2. Return all that are in the database (and check the original too)
    
    ~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms)    1777         def apply_rules(forms):    1778     return [form[:-len(old)] + new
    -> 1779                     for form in forms    1780                     for old, new in substitutions    1781                     if form.endswith(old)]
    
    ~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0)    1779                     for form in forms    1780   for old, new in substitutions
    -> 1781                     if form.endswith(old)]    1782     1783         def filter_forms(forms):
    
    AttributeError: 'generator' object has no attribute 'endswith'
    

    我相信错误来自nltk的源代码。语料库。读者wordnet

    整个源代码可以在nltk文档页面中看到。在这里发帖太长了;但下面是原始的 link :

    谢谢你的帮助。

    1 回复  |  直到 6 年前
        1
  •  2
  •   bruno desthuilliers    6 年前

    错误消息和回溯将为您指出问题的根源:

    在预处理(文本)中12#lemmatization 13 lmtzr=WordNetLemmatizer() ---&燃气轮机;14标记=[lmtzr.lemmatize(标记中逐字逐句)]15预处理的\u文本=“”。join(tokens)16返回预处理的\u文本

    ~\Anaconda3\lib\site packages\nltk\stem\wordnet。py在lemmatize(自我, 单词,pos)38 39 def lemmatize(self,word,pos=名词):

    显然,从函数的签名( word words )以及错误(“没有属性‘endswith’”- endswith() 实际上是 str 方法), lemmatize() 需要一个单词,但在这里:

    tokens=[lmtzr.lemmatize(word for word in tokens)]
    

    您正在传递生成器表达式。

    您想要的是:

    tokens = [lmtzr.lemmatize(word) for word in tokens]
    

    注:您提到:

    我相信错误来自的源代码 nltk。语料库。读者wordnet

    错误确实是 提高 在这个包中,但它“来自”(在“引起”的意义上)您的代码传递了错误的参数;)

    希望这能帮助您下次自己调试此类问题。