代码之家 › 专栏 › 技术社区 › Borja_042

使用Gensim从FastText重新训练.bin文件中的FastText模型时出现问题“FastTextTrainables”对象没有属性“syn1neg”

fasttext pre-trained-model gensim nlp python

Borja_042 · 技术社区 · 5 年前

我正试图为我的问题微调一个使用gensim包装器的FastText预训练模型,但我有问题。我成功地从.bin文件加载了模型嵌入,如下所示:

from gensim.models.fasttext import FastText
model=FastText.load_fasttext_format(r_bin)

sent = [['i', 'am ', 'interested', 'on', 'SPGB'], ['SPGB' 'is', 'a', 'good', 'choice']]
model.build_vocab(sent, update=True)
model.train(sentences=sent, total_examples = len(sent), epochs=5)

无论我做什么改变,我都会一遍又一遍地犯这个错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-91-6456730b1919> in <module>
      1 sent = [['i', 'am', 'interested', 'on', 'SPGB'], ['SPGB' 'is', 'a', 'good', 'choice']]
----> 2 model.build_vocab(sent, update=True)
      3 model.train(sentences=sent, total_examples = len(sent), epochs=5)

/opt/.../fasttext.py in build_vocab(self, sentences, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)
    380         return super(FastText, self).build_vocab(
    381             sentences, update=update, progress_per=progress_per,
--> 382             keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, **kwargs)
    383 
    384     def _set_train_params(self, **kwargs):

/opt/.../base_any2vec.py in build_vocab(self, sentences, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)
    484             trim_rule=trim_rule, **kwargs)
    485         report_values['memory'] = self.estimate_memory(vocab_size=report_values['num_retained_words'])
--> 486         self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
    487 
    488     def build_vocab_from_freq(self, word_freq, keep_raw_vocab=False, corpus_count=None, trim_rule=None, update=False):

/opt/.../fasttext.py in prepare_weights(self, hs, negative, wv, update, vocabulary)
    752 
    753     def prepare_weights(self, hs, negative, wv, update=False, vocabulary=None):
--> 754         super(FastTextTrainables, self).prepare_weights(hs, negative, wv, update=update, vocabulary=vocabulary)
    755         self.init_ngrams_weights(wv, update=update, vocabulary=vocabulary)
    756 

/opt/.../word2vec.py in prepare_weights(self, hs, negative, wv, update, vocabulary)
   1402             self.reset_weights(hs, negative, wv)
   1403         else:
-> 1404             self.update_weights(hs, negative, wv)
   1405 
   1406     def seeded_vector(self, seed_string, vector_size):

/opt/.../word2vec.py in update_weights(self, hs, negative, wv)
   1452             self.syn1 = vstack([self.syn1, zeros((gained_vocab, self.layer1_size), dtype=REAL)])
   1453         if negative:
-> 1454             self.syn1neg = vstack([self.syn1neg, zeros((gained_vocab, self.layer1_size), dtype=REAL)])
   1455         wv.vectors_norm = None
   1456 

AttributeError: 'FastTextTrainables' object has no attribute 'syn1neg'

提前谢谢你的帮助

0 回复 | 直到 5 年前

gojomo 5 年前

感谢您提供详细的代码,其中显示了您尝试过的操作以及遇到的错误。

你确定你使用的是最新的Gensim版本, gensim-3.8.3 ? 我不能用你的代码重现这个错误。

另外:在 gensim-3.8.3系列 你会看到这样的警告:

DeprecationWarning: Call to deprecated 'load_fasttext_format' (use load_facebook_vectors (to use pretrained embeddings) or load_facebook_model (to continue training with the loaded full model, more RAM) instead).

load_facebook_model() 对您来说,使用旧方法不会单独导致问题,但是您的环境应该使用最新的Gensim,并且您的代码应该升级为调用首选方法。)

进一步注意:

因为在你的小测试文本中没有新单词,所以 build_vocab(..., update=True) 不是绝对必要的,也不是做任何相关的事情。模型的已知词汇表在前后相同。(当然,如果实际使用的是带有新词的新句,那会有所不同,但你的小例子还没有真正测试词汇扩展。)

这种将一些新数据或少量新词训练到现有模型中的方式充满了困难的权衡。

特别是,如果新数据只包含新词和原始模型词的某些子集,则只有这些新数据词将根据它们的属性接收训练更新新的

因此,无论是你的新词还是接受过新训练的旧词,都不会与新数据中没有的旧词保持内在的可比性。从本质上说,只有在一起训练的单词才有必要移动到有用的对比位置。