代码之家  ›  专栏  ›  技术社区  ›  Asclepius

python:将大整数紧凑地可逆地编码为base64或base16,长度可变或固定。

  •  -3
  • Asclepius  · 技术社区  · 6 年前

    我希望将具有任意位数的大的无符号或有符号整数紧凑地编码为base64、base32或base16(十六进制)表示形式。输出最终将用作一个字符串,该字符串将用作文件名,但这应该在点的旁边。我正在使用最新的python 3。

    这是可行的,但远不是紧凑的:

    >>> import base64, sys
    >>> i: int = 2**62 - 3  # Can be signed or unsigned.
    >>> b64: bytes =  base64.b64encode(str(i).encode()) # Not a compact encoding.
    >>> len(b64), sys.getsizeof(b64)
    (28, 61)
    

    有一个 prior question 现在已经结束了,对于这些答案,严格关注的是效率低下的表示。请再次注意,在本练习中,我们不想使用任何字符串或不必要的长字节序列。因此,这个问题不是那个问题的复制品。

    1 回复  |  直到 6 年前
        1
  •  0
  •   Asclepius    6 年前

    这个答案的部分动机是由Erik A的不同评论,例如 this 回答。首先将整数紧凑地转换为字节,然后将字节编码为变量。 base .

    有关使用示例,请参阅包含的测试。

    from typing import Callable, Optional
    import base64
    
    class IntBaseEncoder:
        """Reversibly encode an unsigned or signed integer into a customizable encoding of a variable or fixed length."""
        # Ref: https://stackoverflow.com/a/54152763/
        def __init__(self, encoding: str, *, bits: Optional[int] = None, signed: bool = False):
            """
            :param encoder: Name of encoding from base64 module, e.g. b64, urlsafe_b64, b32, b16, etc.
            :param bits: Max bit length of int which is to be encoded. If specified, the encoding is of a fixed length,
            otherwise of a variable length.
            :param signed: If True, integers are considered signed, otherwise unsigned.
            """
            self._decoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}decode')
            self._encoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}encode')
            self.signed: bool = signed
            self.bytes_length: Optional[int] = bits and self._bytes_length(2 ** bits - 1)
    
        def _bytes_length(self, i: int) -> int:
            return (i.bit_length() + 7 + self.signed) // 8
    
        def encode(self, i: int) -> bytes:
            length = self.bytes_length or self._bytes_length(i)
            i_bytes = i.to_bytes(length, byteorder='big', signed=self.signed)
            return self._encoder(i_bytes)
    
        def decode(self, b64: bytes) -> int:
            i_bytes = self._decoder(b64)
            return int.from_bytes(i_bytes, byteorder='big', signed=self.signed)
    
    # Tests:
    import unittest
    
    class TestIntBaseEncoder(unittest.TestCase):
    
        ENCODINGS = ('b85', 'b64', 'urlsafe_b64', 'b32', 'b16')
    
        def test_unsigned_with_variable_length(self):
            for encoding in self.ENCODINGS:
                encoder = IntBaseEncoder(encoding)
                previous_length = 0
                for i in range(1234):
                    encoded = encoder.encode(i)
                    self.assertGreaterEqual(len(encoded), previous_length)
                    self.assertEqual(i, encoder.decode(encoded))
    
        def test_signed_with_variable_length(self):
            for encoding in self.ENCODINGS:
                encoder = IntBaseEncoder(encoding, signed=True)
                previous_length = 0
                for i in range(-1234, 1234):
                    encoded = encoder.encode(i)
                    self.assertGreaterEqual(len(encoded), previous_length)
                    self.assertEqual(i, encoder.decode(encoded))
    
        def test_unsigned_with_fixed_length(self):
            for encoding in self.ENCODINGS:
                for maxint in range(257):
                    encoder = IntBaseEncoder(encoding, bits=maxint.bit_length())
                    maxlen = len(encoder.encode(maxint))
                    for i in range(maxint + 1):
                        encoded = encoder.encode(i)
                        self.assertEqual(len(encoded), maxlen)
                        self.assertEqual(i, encoder.decode(encoded))
    
        def test_signed_with_fixed_length(self):
            for encoding in self.ENCODINGS:
                for maxint in range(257):
                    encoder = IntBaseEncoder(encoding, bits=maxint.bit_length(), signed=True)
                    maxlen = len(encoder.encode(maxint))
                    for i in range(-maxint, maxint + 1):
                        encoded = encoder.encode(i)
                        self.assertEqual(len(encoded), maxlen)
                        self.assertEqual(i, encoder.decode(encoded))
    
    if __name__ == '__main__':
        unittest.main()
    

    如果将输出用作文件名,则使用编码初始化编码器 'urlsafe_b64' 甚至 'b16' 是更安全的选择。

    使用实例:

    # Variable length encoding
    >>> encoder = IntBaseEncoder('urlsafe_b64')
    >>> encoder.encode(12345)
    b'MDk='
    >>> encoder.decode(_)
    12345
    
    # Fixed length encoding
    >>> encoder = IntBaseEncoder('b16', bits=32)
    >>> encoder.encode(12345)
    b'00003039'
    >>> encoder.encode(123456789)
    b'075BCD15'
    >>> encoder.decode(_)
    123456789
    
    # Signed
    encoder = IntBaseEncoder('b32', signed=True)
    encoder.encode(-12345)
    b'Z7DQ===='
    encoder.decode(_)
    -12345