为了进行一项类似的研究,我正在研究已经标记的数据(而不是SPAcy)。我需要使用这些令牌作为输入,以确保在整个板上使用相同的数据。我希望将这些令牌输入到Spacy的Tagger中,但以下操作失败:
import spacy
nlp = spacy.load('en', disable=['tokenizer', 'parser', 'ner', 'textcat'])
sent = ['I', 'like', 'yellow', 'bananas']
doc = nlp(sent)
for i in doc:
print(i)
带有以下跟踪
Traceback (most recent call last):
File "C:/Users/bmvroy/.PyCharm2018.2/config/scratches/scratch_6.py", line 6, in <module>
doc = nlp(sent)
File "C:\Users\bmvroy\venv\lib\site-packages\spacy\language.py", line 346, in __call__
doc = self.make_doc(text)
File "C:\Users\bmvroy\venv\lib\site-packages\spacy\language.py", line 378, in make_doc
return self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected str, got list)
首先,我不知道为什么Spacy试图标记化输入,因为我在
load()
语句。第二,显然这不是一条路。
我正在寻找一种方法来给标记器提供一个令牌列表。斯帕西能做到吗?
我尝试了@aab提供的解决方案,并结合了来自
the documentation
但无济于事:
from spacy.tokens import Doc
from spacy.lang.en import English
from spacy.pipeline import Tagger
nlp = English()
tagger = Tagger(nlp.vocab)
words = ['Listen', 'up', '.']
spaces = [True, False, False]
doc = Doc(nlp.vocab, words=words, spaces=spaces)
processed = tagger(doc)
print(processed)
此代码未运行,并出现以下错误:
processed = tagger(doc)
File "pipeline.pyx", line 426, in spacy.pipeline.Tagger.__call__
File "pipeline.pyx", line 438, in spacy.pipeline.Tagger.predict
AttributeError: 'bool' object has no attribute 'tok2vec'