代码之家  ›  专栏  ›  技术社区  ›  artemis Roberto

在python中用元组值计算两个字典的点积

  •  2
  • artemis Roberto  · 技术社区  · 5 年前

    我有两本这样的字典:

    dict_of_items = tf_idf_by_doc {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)], 3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)], 4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)], 6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}
    
    dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}
    

    我需要计算 dict_of_search 每把钥匙 dict_of_items ,然后存储生成的点积值并按键跟踪。我的意思是…

    项目目录 ,1和中的项 搜索命令 向量为:

    |      | dict_of_items_1 | dict_of_search |
    |:----:|:---------------:|:--------------:|
    | bird |        0        |        0       |
    |  dog |        3        |        1       |
    |  cat |        0        |        3       |
    

    所以我的点积是: 3

    与dict_of_search相比,期望的结果将是dict_of_items及其各自的点积键字典(这将永远只是一个项),按点积降序排序。

    但是,我不知道如何将字典的形状转换为两个数组来执行向量计算,特别是当其中一个词不出现时(例如,在上面的示例中),何时处理 cat 未出现在键中 1 在里面 dict_of_items_1 .

    我试过这样的方法 numpy

    import numpy as numpy
    
    def main():
        test_arr_1 = [1,2,3]
        test_arr_2 = [3,2,6]
    
        first_dot_product = numpy.dot(test_arr_1, test_arr_2)
    
        print("First Example: ", first_dot_product)
    
        test_arr_3 = [3,0,1]
        test_arr_4 = [2,10]
    
        second_dot_product = numpy.dot(test_arr_3, test_arr_4)
    
        print("Second Example Missing Value: ", second_dot_product)
    
    main()
    

    但这失败了,因为向量的大小和形状都不一样。

    ValueError: shapes (3,) and (2,) not aligned: 3 (dim 0) != 2 (dim 0)
    

    我还尝试将字典值重新格式化为列表:

    def main():
        dict_of_items = {'1': [('bird', 0), ('dog', 3), ('egret', 2), ('bird', 0), ('aardvark', 1), ('cat', 3), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 5), ('dog', 1), ('bird', 0), ('fish', 6), ('aardvark', 2), ('egret', 4), ('bird', 0), ('bird', 0), ('bird', 0), ('dog', 5), ('bird', 0), ('aardvark', 1)]}
    
        test_list_of_lists = []
        for k, v in dict_of_items.items():
            curr_list = []
            for aTuple in v:
                curr_list.append(aTuple[1])
            test_list_of_lists.append(curr_list)
    
        print(test_list_of_lists)   
    
    main()
    

    但这只是把所有东西错误地合并到一个列表中: [[0, 3, 2, 0, 1, 3, 1, 0, 6, 5, 1, 0, 6, 2, 4, 0, 0, 0, 5, 0, 1]]

    我也看了一眼 this post ,但那本字典的格式要简单得多。

    2 回复  |  直到 5 年前
        1
  •  1
  •   Dani Mesejo    5 年前

    计算上的值的doc乘积 dict_of_search VS dict_of_items ,您可以:

    def prod(source, target):
        return sum(source.get(key, 0) * target.get(key, 0) for key in source.keys() | target.keys())
    
    
    dict_of_items = {1: [('dog', 3), ('bird', 0)], 2: [('egret', 2), ('cat', 3), ('bird', 0), ('aardvark', 1)],
                     3: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 5)],
                     4: [('fish', 6), ('bird', 0), ('dog', 1), ('aardvark', 2)], 5: [('egret', 4), ('bird', 0)],
                     6: [('bird', 0)], 7: [('dog', 5), ('bird', 0)], 8: [('bird', 0), ('aardvark', 1)]}
    
    dict_of_search = {1: [('bird', 0), ('dog', 1), ('cat', 3)]}
    
    for k, v in dict_of_items.items():
        for se in dict_of_search.values():
            print(k, prod(dict(v), dict(se)))
    

    产量

    1 3
    2 9
    3 1
    4 1
    5 0
    6 0
    7 5
    8 0
    

    如果要将结果存储在字典中,请执行以下操作:

    result = {}
    for k, v in dict_of_items.items():
        for se in dict_of_search.values():
            result[k] = prod(dict(v), dict(se))
    
    print(result)
    

    产量

    {1: 3, 2: 9, 3: 1, 4: 1, 5: 0, 6: 0, 7: 5, 8: 0}
    
        2
  •  0
  •   Dev Khadka    5 年前

    如果你把你的元组转换成下面这样的字典会更容易。然后我们可以用这样的列表理解

    dict_of_items = {key:dict(value) for key, value in dict_of_items.items()}
    dict_of_search = {key:dict(value) for key, value in dict_of_search.items()}
    
    {item_key: sum([search[key]*item.get(key,0)  for key in search.keys()]) 
         for item_key, item in dict_of_items.items() 
         for search in dict_of_search.values()}