代码之家  ›  专栏  ›  技术社区  ›  Stanleyrr

For循环中的错误逻辑

  •  0
  • Stanleyrr  · 技术社区  · 7 年前

        a1 = ['MAGIC', 'BUS']
        a2 = ['TRANSPORTATION' , 'SERVICES', 'GROUP']
    

    我想将列表1中的每个单词与列表2中的每个单词进行比较,并使用nltk获得每对单词的语义相似性分数。我知道如何使用“wn”手动比较每个单词。path_similarity(word_1_in_a1,word_1_in_a2)函数,但我希望能够在For循环中实现这一点。

    以下是我的脚本:

        if len(a1)>len(a2):
           for x in range(len(a1)):
              company_broken_down[x] = wn.synset(a1[x] + '.n.01')
              for y in range(len(a2)):
                  category_broken_down[y] = wn.synset(a2[y] + '.n.01')
              semantic_sim[x]=wn.path_similarity(company_broken_down[x], category_broken_down[y])
        else:
             for y in range(len(a2)):
                category_broken_down[y] = wn.synset(a2[y] + '.n.01')
                for x in range(len(a1)):
                  company_broken_down[x] = wn.synset(a1[x] + '.n.01')
                semantic_sim[y]=wn.path_similarity(company_broken_down[x], category_broken_down[y])
    
        print(semantic_sim)
    

    运行上述脚本后,我得到{0:0.14285714285714285,1:0.058823529411764705,2:0.09091},这是将列表a1中的单词“BUS”与列表a2中的每个单词匹配的结果。然而,a1中的第一个单词“MAGIC”从未使用过。

    1 回复  |  直到 7 年前
        1
  •  1
  •   tshree    7 年前

    您正在覆盖semantic\u sim[y]。尝试以下代码,其中semantic_sim的大小为len(a1)*len(a2):

      if len(a1)>len(a2):
            for x in range(len(a1)):
                company_broken_down[x] = wn.synset(a1[x] + '.n.01')
                for y in range(len(a2)):
                    category_broken_down[y] = wn.synset(a2[y] + '.n.01')
                    semantic_sim[x*len(a2) + y]=wn.path_similarity(company_broken_down[x], category_broken_down[y])
        else:
            for y in range(len(a2)):
                category_broken_down[y] = wn.synset(a2[y] + '.n.01')
                for x in range(len(a1)):
                    company_broken_down[x] = wn.synset(a1[x] + '.n.01')
                    semantic_sim[y*len(a1) + x]=wn.path_similarity(company_broken_down[x], category_broken_down[y])
    
        print(semantic_sim)