代码之家 › 专栏 › 技术社区 › Armen Tsirunyan

哪个更快:x<<1或x<<10?

low-level cpu performance c c++

Armen Tsirunyan · 技术社区 · 14 年前

我不想优化任何东西,我发誓,我只是好奇地问这个问题。我知道在大多数硬件上都有位移位的汇编命令(例如 shl , shr ,这是一个单独的命令。但是(纳秒级或CPU级)你移动了多少位有关系吗?换句话说,在任何一个CPU上,以下哪一个更快?

x << 1;

和

x << 10;

请不要因为这个问题恨我。:)

9 回复 | 直到 7 年前

nimrodm 14 年前

Ben Voigt 14 年前

x << 3 ((x << 1) << 1) << 1

Vovanium 14 年前

x << 10
x << 1

add x,x

onemasse 14 年前

Mike Dunlavey 14 年前

my favorite CPU x<<2 x<<1

the wolf 14 年前

      shift-expression:
              additive-expression
              shift-expression <<  additive-expression
              shift-expression >>  additive-expression

x = y << z;

x = y >> z;

Robert 7 年前

x<<1 x<<10

byte1 = (byte1 << 1) | (byte2 >> 7)
byte2 = (byte2 << 1)

byte1 = (byte2 << 2)
byte2 = 0

R.. GitHub STOP HELPING ICE 14 年前

Peter Cordes 7 年前

x<<1

x << 1 x+x

Intel Haswell add shl http://agner.org/optimize/ x86

cl shl eax,1 add eax,eax shl eax,10

int arr[]; arr[x<<1]

x edi eax x<<10

LEA instruction lets you shift-and-add

gcc and clang both optimize these functions the same way, as you can see on the Godbolt compiler explorer

int shl1(int x) { return x<<1; }
    lea     eax, [rdi+rdi]   # 1 cycle latency, 1 uop
    ret

int shl2(int x) { return x<<2; }
    lea     eax, [4*rdi]    # longer encoding: needs a disp32 of 0 because there's no base register, only scaled-index.
    ret

int times5(int x) { return x * 5; }
    lea     eax, [rdi + 4*rdi]
    ret

int shl10(int x) { return x<<10; }
    mov     eax, edi         # 1 uop, 0 or 1 cycle latency
    shl     eax, 10          # 1 uop, 1 cycle latency
    ret

lea eax, [rdi + rsi + 123] Why is this C++ code faster than my hand-written assembly for testing the Collatz conjecture?

mov Can x86's MOV really be "free"? Why can't I reproduce this at all?

How to multiply a register by 37 using only 2 consecutive leal instructions in x86?

if(x<<1) { } and test test eax, 0x7fffffff jz .false shl eax,1 / jz

move