代码之家  ›  专栏  ›  技术社区  ›  Eduardo

为什么BSON序列化numpy阵列比原始阵列大得多?

  •  4
  • Eduardo  · 技术社区  · 7 年前

    numpy数组不能用json序列化。倾倒我知道 this 但我想知道是否有更好的方法,因为将bytes numpy数组转换为BSON将字节数乘以几乎12(我不明白为什么):

    import numpy as np
    import bson
    from io import StringIO as sio
    RC = 500
    npdata = np.zeros(shape=(RC,RC,3), dtype='B')
    rows, cols, depth = npdata.shape
    npsize = rows*cols*depth
    npdata=npdata.reshape((npsize,))
    listdata = npdata.tolist()
    bsondata = bson.BSON.encode({"rows": rows, "cols": cols, "data": listdata})
    lb = len(bsondata)
    print(lb, npsize, lb/npsize) 
    
    > 8888926 750000 11.851901333333334 
    
    1 回复  |  直到 7 年前
        1
  •  4
  •   MB-F    7 年前

    字节数增加的原因是BSON如何保存数据。您可以在 BSON specification

    import numpy as np
    import bson
    
    npdata = np.arange(5, dtype='B') * 11
    listdata = npdata.tolist()
    bsondata = bson.BSON.encode({"rows": rows, "cols": cols, "data": listdata})
    
    print([hex(b) for b in bsondata])
    

    在这里,我们存储一个包含值的数组 [0, 11, 22, 33, 44, 55] 作为BSON并打印生成的二进制数据。下面我对结果进行了注释,以解释发生了什么:

    ['0x47', '0x0', '0x0', '0x0',  # total number of bytes in the document
     # First element in document
         '0x4',  # Array
         '0x64', '0x61', '0x74', '0x61', '0x0',  # key: "data"
         # subdocument (data array)
             '0x4b',  '0x0', '0x0', '0x0',  # total number of bytes
             # first element in data array
                 '0x10',                        # 32 bit integer
                 '0x30', '0x0',                 # key: "0"
                 '0x0', '0x0', '0x0', '0x0',    # value: 0
             # second element in data array
                 '0x10',                        # 32 bit integer
                 '0x31', '0x0',                 # key: "1"
                 '0xb', '0x0', '0x0', '0x0',    # value: 11
             # third element in data array
                 '0x10',                        # 32 bit integer
                 '0x32', '0x0',                 # key: "2"
                 '0x16', '0x0', '0x0', '0x0',   # value: 22             
     # ...
    ]
    

    我找到了两个图书馆 GitHub - mongodb/bson-numpy GitHub - ajdavis/bson-numpy