代码之家  ›  专栏  ›  技术社区  ›  Armen Tsirunyan

哪个更快:x<<1或x<<10?

  •  82
  • Armen Tsirunyan  · 技术社区  · 14 年前

    我不想优化任何东西,我发誓,我只是好奇地问这个问题。 我知道在大多数硬件上都有位移位的汇编命令(例如 shl , shr ,这是一个单独的命令。但是(纳秒级或CPU级)你移动了多少位有关系吗?换句话说,在任何一个CPU上,以下哪一个更快?

    x << 1;
    

    x << 10;
    

    请不要因为这个问题恨我。:)

    9 回复  |  直到 7 年前
        1
  •  83
  •   nimrodm    14 年前

        2
  •  62
  •   Ben Voigt    14 年前

    x << 3 ((x << 1) << 1) << 1

        3
  •  28
  •   Vovanium    14 年前

    1. x << 10

    2. x << 1

    add x,x

        4
  •  9
  •   onemasse    14 年前

        5
  •  9
  •   Mike Dunlavey    14 年前
        6
  •  7
  •   the wolf    14 年前

          shift-expression:
                  additive-expression
                  shift-expression <<  additive-expression
                  shift-expression >>  additive-expression
    

    x = y << z;
    

    x = y >> z;
    

        7
  •  7
  •   Robert    7 年前

    x<<1 x<<10

    byte1 = (byte1 << 1) | (byte2 >> 7)
    byte2 = (byte2 << 1)
    

    byte1 = (byte2 << 2)
    byte2 = 0
    

        8
  •  5
  •   R.. GitHub STOP HELPING ICE    14 年前

        9
  •  3
  •   Peter Cordes    7 年前

    x<<1


    x << 1 x+x

    Intel Haswell add shl http://agner.org/optimize/

    cl shl eax,1 add eax,eax shl eax,10

    int arr[]; arr[x<<1]


    x edi eax x<<10

    LEA instruction lets you shift-and-add

    gcc and clang both optimize these functions the same way, as you can see on the Godbolt compiler explorer

    int shl1(int x) { return x<<1; }
        lea     eax, [rdi+rdi]   # 1 cycle latency, 1 uop
        ret
    
    int shl2(int x) { return x<<2; }
        lea     eax, [4*rdi]    # longer encoding: needs a disp32 of 0 because there's no base register, only scaled-index.
        ret
    
    int times5(int x) { return x * 5; }
        lea     eax, [rdi + 4*rdi]
        ret
    
    int shl10(int x) { return x<<10; }
        mov     eax, edi         # 1 uop, 0 or 1 cycle latency
        shl     eax, 10          # 1 uop, 1 cycle latency
        ret
    

    lea eax, [rdi + rsi + 123] Why is this C++ code faster than my hand-written assembly for testing the Collatz conjecture?

    mov Can x86's MOV really be "free"? Why can't I reproduce this at all?

    How to multiply a register by 37 using only 2 consecutive leal instructions in x86?


    if(x<<1) { } and test test eax, 0x7fffffff jz .false shl eax,1 / jz

    move