代码之家 › 专栏 › 技术社区 › user1603472

使用sklearn.feature_extract.text计数矢量器时从文件中读取文档

scikit-learn python-2.7

user1603472 · 技术社区 · 11 年前

我可以使用文档中的示例中的代码,其中fit_transform()函数的输入是一个句子列表,即:

corpus = [
   'this is the first document',
   'this is the second second document',
   'and the third one',
   'is this the first document?'
]

X=矢量器.fit_transform(语料库)

并获得预期的数据。但是,当我试图用文件列表或文档中建议的文件对象替换语料库时,它可能是:

" 适合(raw_documents,y=无)

Learn a vocabulary dictionary of all tokens in the raw documents.
Parameters :    
raw_documents : iterable
    An iterable which yields either str, unicode or file objects.
Returns :   
self :

..所以我认为我对管道的理解有些欠缺。给定一个我想CountVectorize的文件目录,我该怎么做? 如果我试图提供一个文件对象列表,如[open(file,'r')],我得到的错误消息是文件对象没有更低的函数。

1 回复 | 直到 11 年前

Fred Foo 11 年前

设置矢量器 input constructor parameter 至任一 filename 或 file 。其默认值为 content ,假设您已经将文件读取到内存中。

推荐文章

Vasu Mistry · 如何用字符串值解析yaml文件

2 年前

user13643099 · Python2.7使用子流程。Popen向kubectl exec发送了一封不工作的吊舱

2 年前

kopew · 索引器:列表索引超出api的范围

2 年前

Atefeh Hedayati · 如何使用矩阵乘法简化循环?

2 年前

Sachin Verma · 如何使用sqlbuilder使用聚合函数(平均、计数、最大、最小)。智能SQL

2 年前

wayoh22 · 检查部分值和返回全部值的列表

2 年前

Samy Mostakim · chrome正常工作,但firefox给我这个erorr

2 年前

XManit · 无法在python2上安装pyinstaller。7.18

2 年前

arwind mohan kmm · Python中的图像拆分器

2 年前

Cranjis · 网址。解析Python2。7相当于

6 年前