代码之家 › 专栏 › 技术社区 › Nikhil Raghavendra

使用NLTK生成二元图

n-gram nltk python

Nikhil Raghavendra · 技术社区 · 8 年前

我试图生成给定句子的二元列表,例如,如果我键入,

    To be or not to be

我希望程序生成

     to be, be or, or not, not to, to be

我尝试了以下代码,但只给出了

<generator object bigrams at 0x0000000009231360>

这是我的代码:

    import nltk
    bigrm = nltk.bigrams(text)
    print(bigrm)

那么我怎样才能得到我想要的呢?我想要一个上面的单词组合列表(tobe,be,or not,not to,to be)。

3 回复 | 直到 8 年前

Ilja Everilä 8 年前

nltk.bigrams() 返回bigram的迭代器(特别是生成器)。如果需要列表,请将迭代器传递给 list() 。它还需要一系列项从中生成bigram,因此您必须在传递文本之前将其拆分(如果您没有这样做):

bigrm = list(nltk.bigrams(text.split()))

要用逗号分隔打印出来,可以(在python 3中):

print(*map(' '.join, bigrm), sep=', ')

如果在python 2上,则例如:

print ', '.join(' '.join((a, b)) for a, b in bigrm)

请注意,只需使用迭代器,打印时不需要生成列表。

Steffi Keran Rani J 7 年前

以下代码生成 bigram 给定句子的列表

>>> import nltk
>>> from nltk.tokenize import word_tokenize
>>> text = "to be or not to be"
>>> tokens = nltk.word_tokenize(text)
>>> bigrm = nltk.bigrams(tokens)
>>> print(*map(' '.join, bigrm), sep=', ')
to be, be or, or not, not to, to be

Shashwat 3 年前

很晚了,但这是另一种方式。

>>> from nltk.util import ngrams
>>> text = "I am batman and I like coffee"
>>> _1gram = text.split(" ")
>>> _2gram = [' '.join(e) for e in ngrams(_1gram, 2)]
>>> _3gram = [' '.join(e) for e in ngrams(_1gram, 3)]
>>> 
>>> _1gram
['I', 'am', 'batman', 'and', 'I', 'like', 'coffee']
>>> _2gram
['I am', 'am batman', 'batman and', 'and I', 'I like', 'like coffee']
>>> _3gram
['I am batman', 'am batman and', 'batman and I', 'and I like', 'I like coffee']