代码之家 › 专栏 › 技术社区 › psihodelia

如何优化这一行C代码(检查范围)?

math optimization c

psihodelia · 技术社区 · 14 年前

有什么方法可以优化下面的C代码行(以避免分支)?

if ((i < -threshold) || (i > threshold)) 
{ 
    counter++; 
}

10 回复 | 直到 14 年前

Oliver Charlesworth 14 年前

怎么样:

counter += (i < -threshold) | (i > threshold);

假设原始代码是有效的,那么这也应该以可移植的方式工作。标准规定关系运算符( < , > int 等于 1 0 失败时。

更新

要回答Sheen下面的评论,请执行以下代码:

int main()
{
    short threshold = 10;
    short i = 20;
    short counter = 0;

    counter += (i < -threshold) | (i > threshold);

    return 0;
}

使用GCC在x86上生成以下反汇编程序,未进行优化:

  push   %rbp
  mov    %rsp,%rbp
  movw   $0xa,-6(%rbp)
  movw   $0x14,-4(%rbp)
  movw   $0x0,-2(%rbp)
  movswl -4(%rbp),%edx
  movswl -6(%rbp),%eax
  neg    %eax
  cmp    %eax,%edx
  setl   %dl
  movzwl -4(%rbp),%eax
  cmp    -6(%rbp),%ax
  setg   %al
  or     %edx,%eax
  movzbw %al,%dx
  movzwl -2(%rbp),%eax
  lea    (%rdx,%rax,1),%eax
  mov    %ax,-2(%rbp)
  mov    $0x0,%eax
  leaveq 
  retq

R.. GitHub STOP HELPING ICE 14 年前

有一个标准的习惯用法是使用一条比较指令进行范围检查。就像是:

(unsigned)x - a <= (unsigned)b - a   /* a <= x <= b */
(unsigned)x - a < (unsigned)b - a    /* a <= x < b */

isdigit 以标准保证正确):

(unsigned)ch - '0' < 10

如果原始类型大于 int long long )然后需要使用更大的无符号类型(例如 unsigned long long ). 如果 a 和 b b-a 不会溢出,可以忽略 乙 .

a<=b 类型/值必须使原始表达式(即。 a <= x && x <= b 或类似的)数学上正确的行为。例如,如果 x 签署并 乙 x<=b 当 x=-1 和 b=UINT_MAX-1

至于这个“诀窍”是如何工作的,它是纯粹的决定,在归约模之后 UINT_MAX+1 ,是否 x-a b-a公司 .

就你而言,我认为以下几点应该很好:

(unsigned)i + threshold > 2U * threshold;

如果 threshold 门槛 2U*threshold 在登记册上。

说到优化,一个好的编译器应该优化原始的范围测试,在它知道满足约束的地方使用无符号算法。我想很多人都这么做了和 乙 (unsigned)x-a<b-a 习惯用法在宏中仍然非常有用,在宏中可以确保只评估一次。

Christoffer 14 年前

#include <stdint.h>
int main()
{
    int32_t threshold_square = 100;
    int16_t i = 20;
    int16_t counter = 0;

    counter += ( (int32_t) i * i > threshold_square);

    return 0;
}

在不进行优化的情况下使用GCC生成以下x86汇编程序

pushq   %rbp
movq    %rsp, %rbp
movl    $100, -8(%rbp)
movw    $20, -2(%rbp)
movw    $0, -4(%rbp)
movswl  -2(%rbp),%edx
movswl  -2(%rbp),%eax
imull   %edx, %eax
cmpl    -8(%rbp), %eax
setg    %al
movzbl  %al, %edx
movzwl  -4(%rbp), %eax
leal    (%rdx,%rax), %eax
movw    %ax, -4(%rbp)
movl    $0, %eax
leave
ret

比使用 (i < -threshold) | (i > threshold)

当然,这是否更好取决于体系结构。

Nico 14 年前

根据“i”值的分布,CPU很可能会比任何代码更改都更好地缓存分支预测。见 http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/ http://www.reddit.com/r/programming/comments/c7ues/fast_and_slow_ifstatements_branch_prediction_in/

mirk 14 年前

这是基于 bit twiddling hacks ,(强烈推荐)

#define CHAR_BIT 8

int main()
{
  int i=-3; // example input
  int treshold=2; // example treshold
  int count=0;
  // step 1: find the absolute value of i
  unsigned int r;  // the result goes here 
  int const mask = i >> (sizeof(int) * CHAR_BIT - 1);
  r = (i + mask) ^ mask;
  // step 2: compute the sign of the difference
  // sign becomes 0 (if r<=treshold)
  // sign becomes 1 otherwise
  int sign = 1 ^ ((unsigned int)(r-treshold-1) >> (sizeof(int) * CHAR_BIT - 1));
  count+=sign;
  return count;
}

它使用g++进行编译。

速度取决于使用的处理器。毕竟分支可能更快。

Sparky 14 年前

我认为奥利·查尔斯沃思的想法是对的。然而,我怀疑它可以进一步优化(以牺牲可读性为代价)。

阈值可以标准化为零以删除比较。

counter += ((unsigned) (i + threshhold)  < (unsigned) (threshhold + threshhold));

Skizz 14 年前

可以使用以下技巧将分支减少为单个分支:

if (((unsigned) (i + threshold)) > (threshold << 1)) 
{ 
  counter++; 
}

if (((unsigned) i + (unsigned) threshold) > ((unsigned) threshold << 1)) 
{ 
  counter++; 
}

Vovanium 14 年前

此代码没有高度可移植的分支(但是,abs的实现可能有一个分支)。

#include <stdlib.h>
counter += abs(i) > threshold;

这是最简单的标准兼容表达式。

如果编译器没有为abs()使用优化宏,则可以使用自己的宏/内联函数。

#define ABS(x) ((x)*(((x)>>15)|1))

#define ABS(x) ((x)-((x)>>15)^((x)>>15))

您也可以将比较运算符替换为如下表达式:

#define LESS(x, y) (-((x)-(y))>>15))

counter -= ((threshold - abs(i)) >> 15);

所有这些宏都依赖于这样一个事实,即右移到位数减去一个正值或零的值等于零,而减去一个负值的值等于零。但这就是实现的定义。

Ahmed 14 年前

比较两者的绝对值

short imask = i >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0
short tmask = threshold >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0

short iabsolute = (i + imask) ^ imask; // compute i absolute
short tabsolute = (threshold + tmask) ^ tmask; // compute threshold absolute

counter += iabsolute > tabsolute;

-1

xlq 14 年前

任何一个优秀的编译器都应该能够很好地优化它。任何手部优化可能只会导致混淆。