我需要在斜线上分开,然后报告标签。这是拼音字典格式。我试图在github上找到一个这样做的类,但找不到。
# vi test.txt
test/S
boy
girl/SE
home/
house/SE123
man/E
country
wind/ES
代码:
from collections import defaultdict
myl=defaultdict(list)
with open('test.txt') as f :
for l in f:
l = l.rstrip()
try:
tags = l.split('/')[1]
myl[tags].append(l.split('/')[0])
for t in tags:
myl[t].append( l.split('/')[0])
except:
pass
输出:
defaultdict(list,
{'S': ['test', 'test', 'girl', 'house', 'wind'],
'SE': ['girl'],
'E': ['girl', 'house', 'man', 'man', 'wind'],
'': ['home'],
'SE123': ['house'],
'1': ['house'],
'2': ['house'],
'3': ['house'],
'ES': ['wind']})
se组应该有3个单词“girl”、“wind”和“house”。不应存在ES组,因为它包含在“SE”中,并且SE123应保持原样。我怎样才能做到这一点?
更新:
我已经设法添加了大图,但我如何添加3,4,5克?
from collections import defaultdict
import nltk
myl=defaultdict(list)
with open('hi_IN.dic') as f :
for l in f:
l = l.rstrip()
try:
tags = l.split('/')[1]
ntags=''.join(sorted(tags))
myl[ntags].append(l.split('/')[0])
for t in tags:
myl[t].append( l.split('/')[0])
bigrm = list(nltk.bigrams([i for i in tags]))
nlist=[x+y for x, y in bigrm]
for t1 in nlist:
t1a=''.join(sorted(t1))
myl[t1a].append(l.split('/')[0])
except:
pass
我想如果我在源代码处对标签进行排序会有帮助:
with open('test1.txt', 'w') as nf:
with open('test.txt') as f :
for l in f:
l = l.rstrip()
try:
tags = l.split('/')[1]
except IndexError:
nline= l
else:
ntags=''.join(sorted(tags))
nline= l.split('/')[0] + '/' + ntags
nf.write(nline+'\n')
这将创建一个带有排序标记的新文件test1.txt。但是三联图+的问题仍然没有解决。
我下载了一个示例文件:
!小精灵
https://raw.githubusercontent.com/wooorm/dictionaries/master/dictionaries/en-US/index.dic
使用“grep”命令的报告是正确的。
!grep 'P.*U' index1.dic
CPU/M
GPU
aware/PU
cleanly/PRTU
common/PRTUY
conscious/PUY
easy/PRTU
faithful/PUY
friendly/PRTU
godly/PRTU
grateful/PUY
happy/PRTU
healthy/PRTU
holy/PRTU
kind/PRTUY
lawful/PUY
likely/PRTU
lucky/PRTU
natural/PUY
obtrusive/PUY
pleasant/PTUY
prepared/PU
reasonable/PU
responsive/PUY
righteous/PU
scrupulous/PUY
seemly/PRTU
selfish/PUY
timely/PRTU
truthful/PUY
wary/PRTU
wholesome/PU
willing/PUY
worldly/PTU
worthy/PRTU
对排序标记文件使用bigrams的python报告不包含上述所有单词。
myl['PU']
['aware',
'aware',
'conscious',
'faithful',
'grateful',
'lawful',
'natural',
'obtrusive',
'prepared',
'prepared',
'reasonable',
'reasonable',
'responsive',
'righteous',
'righteous',
'scrupulous',
'selfish',
'truthful',
'wholesome',
'wholesome',
'willing']