代码之家  ›  专栏  ›  技术社区  ›  Kelly Bundy

为什么更简单的循环更慢?

  •  6
  • Kelly Bundy  · 技术社区  · 1 年前

    使用调用 n = 10**8 ,对我来说,简单循环始终比复杂循环慢得多,我不明白为什么:

    def simple(n):
        while n:
            n -= 1
    
    def complex(n):
        while True:
            if not n:
                break
            n -= 1
    

    某些时间(以秒为单位):

    simple 4.340795516967773
    complex 3.6490490436553955
    simple 4.374553918838501
    complex 3.639145851135254
    simple 4.336690425872803
    complex 3.624480724334717
    Python: 3.11.4 (main, Sep  9 2023, 15:09:21) [GCC 13.2.1 20230801]
    

    下面是字节码的循环部分,如所示 dis.dis(simple) :

      6     >>    6 LOAD_FAST                0 (n)
                  8 LOAD_CONST               1 (1)
                 10 BINARY_OP               23 (-=)
                 14 STORE_FAST               0 (n)
    
      5          16 LOAD_FAST                0 (n)
                 18 POP_JUMP_BACKWARD_IF_TRUE     7 (to 6)
    

    以及 complex :

     10     >>    4 LOAD_FAST                0 (n)
                  6 POP_JUMP_FORWARD_IF_TRUE     2 (to 12)
    
     11           8 LOAD_CONST               0 (None)
                 10 RETURN_VALUE
    
     12     >>   12 LOAD_FAST                0 (n)
                 14 LOAD_CONST               2 (1)
                 16 BINARY_OP               23 (-=)
                 20 STORE_FAST               0 (n)
    
      9          22 JUMP_BACKWARD           10 (to 4)
    

    所以看起来复杂的一个每次迭代要做更多的工作(两次跳跃而不是一次)。那为什么它更快?

    这似乎是Python 3.11的一个现象,请参阅评论。

    基准脚本( Attempt This Online! ):

    from time import time
    import sys
    
    def simple(n):
        while n:
            n -= 1
    
    def complex(n):
        while True:
            if not n:
                break
            n -= 1
    
    for f in [simple, complex] * 3:
        t = time()
        f(10**8)
        print(f.__name__, time() - t)
    
    print('Python:', sys.version)
    
    1 回复  |  直到 1 年前
        1
  •  6
  •   Mechanic Pig    1 年前

    我检查了字节码的源代码(python 3.11.6),发现在反编译的字节码中,似乎只有 JUMP_BACKWARD 将执行一个预热功能,该功能将触发 specialization 在python 3.11中,当执行足够的次数时:

    PyObject* _Py_HOT_FUNCTION
    _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)
    {
        /* ... */
            TARGET(JUMP_BACKWARD) {
                _PyCode_Warmup(frame->f_code);
                JUMP_TO_INSTRUCTION(JUMP_BACKWARD_QUICK);
            }
        /* ... */
    }
    
    static inline void
    _PyCode_Warmup(PyCodeObject *code)
    {
        if (code->co_warmup != 0) {
            code->co_warmup++;
            if (code->co_warmup == 0) {
                _PyCode_Quicken(code);
            }
        }
    }
    

    专业化似乎加快了使用多个字节码的速度,从而显著提高了速度:

    void
    _PyCode_Quicken(PyCodeObject *code)
    {
        /* ... */
                switch (opcode) {
                    case EXTENDED_ARG:  /* ... */
                    case JUMP_BACKWARD: /* ... */
                    case RESUME:        /* ... */
                    case LOAD_FAST:     /* ... */
                    case STORE_FAST:    /* ... */
                    case LOAD_CONST:    /* ... */
                }
        /* ... */
    }
    

    执行一次后,的字节码 complex 更改,同时 simple 没有:

    In [_]: %timeit -n 1 -r 1 complex(10 ** 8)
    2.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    
    In [_]: dis(complex, adaptive=True)
      5           0 RESUME_QUICK             0
    
      6           2 NOP
    
      7           4 LOAD_FAST                0 (n)
                  6 POP_JUMP_FORWARD_IF_TRUE     2 (to 12)
    
      8           8 LOAD_CONST               0 (None)
                 10 RETURN_VALUE
    
      9     >>   12 LOAD_FAST__LOAD_CONST     0 (n)
                 14 LOAD_CONST               2 (1)
                 16 BINARY_OP_SUBTRACT_INT    23 (-=)
                 20 STORE_FAST               0 (n)
    
      6          22 JUMP_BACKWARD_QUICK     10 (to 4)
    
    
    In [_]: %timeit -n 1 -r 1 simple(10 ** 8)
    4.78 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    
    In [_]: dis(simple, adaptive=True)
      1           0 RESUME                   0
    
      2           2 LOAD_FAST                0 (n)
                  4 POP_JUMP_FORWARD_IF_FALSE     9 (to 24)
    
      3     >>    6 LOAD_FAST                0 (n)
                  8 LOAD_CONST               1 (1)
                 10 BINARY_OP               23 (-=)
                 14 STORE_FAST               0 (n)
    
      2          16 LOAD_FAST                0 (n)
                 18 POP_JUMP_BACKWARD_IF_TRUE     7 (to 6)
                 20 LOAD_CONST               0 (None)
                 22 RETURN_VALUE
            >>   24 LOAD_CONST               0 (None)
                 26 RETURN_VALUE