代码之家 › 专栏 › 技术社区 › conorgriffin

我能使这个C++代码更快,而不是使它变得更复杂吗?

performance c++

conorgriffin · 技术社区 · 14 年前

请记住,我正在学习C++,所以我的目标是改进我理解的代码,而不仅仅是给出一个完美的解决方案。

谢谢

此问题的目的是验证用于读取输入数据的方法是否足够快,以处理带有大量输入/输出警告的问题。您希望在运行时每秒至少能够处理2.5MB的输入数据。处理测试数据的时间限制为8秒。

输入以两个正整数n k(n,k<=10^7). 接下来的n行输入包含一个正整数ti,每个不大于10^9。输出

将单个整数写入输出,表示有多少整数ti可被k整除。

输入:

7 3
1.

966369

9
999996
11

输出:

#include <iostream>
#include <stdio.h>
using namespace std;

int main(){
  //n is number of integers to perform calculation on
  //k is the divisor
  //inputnum is the number to be divided by k
  //total is the total number of inputnums divisible by k

  int n,k,inputnum,total;

  //initialize total to zero
  total=0;

  //read in n and k from stdin
  scanf("%i%i",&n,&k);

  //loop n times and if k divides into n, increment total
  for (n; n>0; n--)
  {
    scanf("%i",&inputnum);
    if(inputnum % k==0) total += 1;
  }

 //output value of total
 printf("%i",total);
 return 0;
}

10 回复 | 直到 4 年前

Andrew Dalke 14 年前

这并不是最优雅的代码,我可能有一些边缘案例,但这足以让您使用更快的方法。

[xavier:~/tmp] dalke% g++ -O3 my_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
0.284u 0.057s 0:00.39 84.6% 0+0k 0+1io 0pf+0w
[xavier:~/tmp] dalke% g++ -O3 your_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
3.585u 0.087s 0:03.72 98.3% 0+0k 0+0io 0pf+0w

这是密码。

#include <iostream>
#include <stdio.h>
using namespace std;

const int BUFFER_SIZE=400000;
const int EXTRA=30;  // well over the size of an integer 

void read_to_newline(char *buffer) {
  int c;
  while (1) {
    c = getc_unlocked(stdin);
    if (c == '\n' || c == EOF) {
      *buffer = '\0';
      return;
    }
    *buffer++ = c;
  }
} 

int main() {
  char buffer[BUFFER_SIZE+EXTRA];
  char *end_buffer;
  char *startptr, *endptr;

  //n is number of integers to perform calculation on
  //k is the divisor
  //inputnum is the number to be divided by k
  //total is the total number of inputnums divisible by k

  int n,k,inputnum,total,nbytes;

  //initialize total to zero
  total=0;

  //read in n and k from stdin
  read_to_newline(buffer);
  sscanf(buffer, "%i%i",&n,&k);

  while (1) {
    // Read a large block of values
    // There should be one integer per line, with nothing else.
    // This might truncate an integer!
    nbytes = fread(buffer, 1, BUFFER_SIZE, stdin);
    if (nbytes == 0) {
      cerr << "Reached end of file too early" << endl;
      break;
    }
    // Make sure I read to the next newline.
    read_to_newline(buffer+nbytes);

    startptr = buffer;
    while (n>0) {
      inputnum = 0;
      // I had used strtol but that was too slow
      //   inputnum = strtol(startptr, &endptr, 10);
      // Instead, parse the integers myself.
      endptr = startptr;
      while (*endptr >= '0') {
        inputnum = inputnum * 10 + *endptr - '0';
        endptr++;
      }
      // *endptr might be a '\n' or '\0'

      // Might occur with the last field
      if (startptr == endptr) {
        break;
      }
      // skip the newline; go to the
      // first digit of the next number.
      if (*endptr == '\n') {
        endptr++;
      }
      // Test if this is a factor
      if (inputnum % k==0) total += 1;

      // Advance to the next number
      startptr = endptr;

      // Reduce the count by one
      n--;
    }
    // Either we are done, or we need new data
    if (n==0) {
      break;
    }
  }

 // output value of total
 printf("%i\n",total);
 return 0;
}

wallyk 14 年前

速度不是由计算决定的,程序运行的大部分时间是由i/o消耗的。

setvbuf 在第一个之前打电话 scanf 对于一个显著的改进:

setvbuf(stdin, NULL, _IOFBF, 32768);
setvbuf(stdout, NULL, _IOFBF, 32768);

--编辑--

幻数是新的缓冲区大小。默认情况下,文件使用512字节的缓冲区。增加这个大小会减少C++运行库向操作系统发出读写调用的次数,这是目前为止算法中最昂贵的操作。

1024*10 或 1024*1024 取决于要在其上运行的系统。对于16位系统,大于32K或64K的缓冲区大小通常会导致分配缓冲区和管理缓冲区的困难。对于任何更大的系统,根据可用内存和它将要竞争的其他内容,使其尽可能大。

如果没有任何已知的内存争用,请为缓冲区选择大约与关联文件大小相同的大小。也就是说,如果输入文件为250K,则将其用作缓冲区大小。随着缓冲区大小的增加,回报肯定会减少。对于250K示例,100K缓冲区需要三次读取,而默认512字节缓冲区需要500次读取。进一步增加缓冲区大小以便只需要一次读取不太可能比三次读取有显著的性能改进。

Anycorn 14 年前

尝试将if语句替换为 count += ((n%k)==0);

但我认为您确实需要将输入缓冲到临时数组中。一次从输入中读取一个整数是昂贵的。若你们能将数据采集和数据处理分开,那个么编译器就能够为数学运算生成优化的代码。

mloskot 14 年前

虽然你的例子很简单,我几乎看不出你能消除什么——假设这是问题的一部分,那么接下来阅读stdin。

对代码的一些注释:您的示例没有使用任何流-不需要包括iostream头。您已经通过将STLIO.H替代C++版本的头CSDIO,将C库元素加载到全局命名空间,因此不需要使用名称空间STD。

Anonym Mus 14 年前

您可以使用gets()读取每一行,自己在不使用scanf()的情况下解析字符串(通常我不建议使用gets(),但在本例中,输入是经过良好指定的。)

#include <stdio.h>
int main() {
   int n,k,in,tot=0,i;
   char s[1024];
   gets(s);
   sscanf(s,"%d %d",&n,&k);
   while(n--) {
      gets(s);
      in=s[0]-'0';
      for(i=1; s[i]!=0; i++) {
        in=in*10 + s[i]-'0';   /* For each digit read, multiply the previous 
                                  value of in with 10 and add the current digit */
      }
      tot += in%k==0;          /* returns 1 if in%k is 0, 0 otherwise */
   }
   printf("%d\n",tot);
   return 0;
}

此程序比您在上面给出的解决方案(在我的机器上)快约2.6倍。

Frunsi 14 年前

DanJ 14 年前

我认为代码很好。我用不到0.3秒的时间在我的电脑上运行了它我甚至在不到一秒钟的时间里用更大的输入运行它。

你是怎么定时间的?

从total=n开始,然后在循环内:

total-=int((输入%k)/k+1)//0如果可整除,1如果不可整除

Jerry Coffin 14 年前

#include <windows.h>
#include <process.h>
#include <iostream>
#include <time.h>
#include "queue.hpp"

namespace jvc = JVC_thread_queue;

struct buffer { 
    static const int initial_size = 1024 * 1024;
    char buf[initial_size];
    size_t size;

    buffer() : size(initial_size) {}
};

jvc::queue<buffer *> outputs;

void read(HANDLE file) {
    // read data from specified file, put into buffers for processing.
    //
    char temp[32];
    int temp_len = 0;
    int i;

    buffer *b;
    DWORD read;

    do { 
        b = new buffer;

        // If we have a partial line from the previous buffer, copy it into this one.
        if (temp_len != 0)
            memcpy(b->buf, temp, temp_len);

        // Then fill the buffer with data.
        ReadFile(file, b->buf+temp_len, b->size-temp_len, &read, NULL);

        // Look for partial line at end of buffer.
        for (i=read; b->buf[i] != '\n'; --i)
            ;

        // copy partial line to holding area.
        memcpy(temp, b->buf+i, temp_len=read-i);

        // adjust size.
        b->size = i;

        // put buffer into queue for processing thread.
        // transfers ownership.
        outputs.add(b);
    } while (read != 0);
}

// A simplified istrstream that can only read int's.
class num_reader { 
    buffer &b;
    char *pos;
    char *end;
public:
    num_reader(buffer *buf) : b(*buf), pos(b.buf), end(pos+b.size) {}

    num_reader &operator>>(int &value){ 
        int v = 0;

        // skip leading "stuff" up to the first digit.
        while ((pos < end) && !isdigit(*pos))
            ++pos;

        // read digits, create value from them.
        while ((pos < end) && isdigit(*pos)) {
            v = 10 * v + *pos-'0';
            ++pos;
        }
        value = v;
        return *this;
    }

    // return stream status -- only whether we're at end
    operator bool() { return pos < end; }
};

int result;

unsigned __stdcall processing_thread(void *) {
    int value;
    int n, k;
    int count = 0;

    // Read first buffer: n & k followed by values.
    buffer *b = outputs.pop();
    num_reader input(b);
    input >> n;
    input >> k;
    while (input >> value && ++count < n) 
        result += ((value %k ) == 0);

    // Ownership was transferred -- delete buffer when finished.
    delete b;

    // Then read subsequent buffers:
    while ((b=outputs.pop()) && (b->size != 0)) {
        num_reader input(b);
        while (input >> value && ++count < n)
            result += ((value %k) == 0);

        // Ownership was transferred -- delete buffer when finished.
        delete b;
    }
    return 0;
}

int main() { 
    HANDLE standard_input = GetStdHandle(STD_INPUT_HANDLE);
    HANDLE processor = (HANDLE)_beginthreadex(NULL, 0, processing_thread, NULL, 0, NULL);

    clock_t start = clock();
    read(standard_input);
    WaitForSingleObject(processor, INFINITE);
    clock_t finish = clock();

    std::cout << (float)(finish-start)/CLOCKS_PER_SEC << " Seconds.\n";
    std::cout << result;
    return 0;
}

这使用了我多年前编写的线程安全队列类:

#ifndef QUEUE_H_INCLUDED
#define QUEUE_H_INCLUDED

namespace JVC_thread_queue { 
template<class T, unsigned max = 256>
class queue { 
    HANDLE space_avail; // at least one slot empty
    HANDLE data_avail;  // at least one slot full
    CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos

    T buffer[max];
    long in_pos, out_pos;
public:
    queue() : in_pos(0), out_pos(0) { 
        space_avail = CreateSemaphore(NULL, max, max, NULL);
        data_avail = CreateSemaphore(NULL, 0, max, NULL);
        InitializeCriticalSection(&mutex);
    }

    void add(T data) { 
        WaitForSingleObject(space_avail, INFINITE);       
        EnterCriticalSection(&mutex);
        buffer[in_pos] = data;
        in_pos = (in_pos + 1) % max;
        LeaveCriticalSection(&mutex);
        ReleaseSemaphore(data_avail, 1, NULL);
    }

    T pop() { 
        WaitForSingleObject(data_avail,INFINITE);
        EnterCriticalSection(&mutex);
        T retval = buffer[out_pos];
        out_pos = (out_pos + 1) % max;
        LeaveCriticalSection(&mutex);
        ReleaseSemaphore(space_avail, 1, NULL);
        return retval;
    }

    ~queue() { 
        DeleteCriticalSection(&mutex);
        CloseHandle(data_avail);
        CloseHandle(space_avail);
    }
};
}

#endif

你从中获得的确切数量取决于阅读所花费的时间与其他处理所花费的时间。在本例中,另一个处理非常简单,可能不会获得太多收益。如果在处理数据上花费更多的时间,多线程可能会获得更多。

Mike Dunlavey 14 年前

2.5mb/秒是400ns/字节。

有两个大的每字节进程,文件输入和解析。

fread 应该能够在大约全磁盘带宽的情况下读取。

sscanf 是为了通用性而不是速度。 atoi

#define DIGIT(c)((c)>='0' && (c) <= '9')
bool parsInt(char* &p, int& num){
  while(*p && *p <= ' ') p++; // scan over whitespace
  if (!DIGIT(*p)) return false;
  num = 0;
  while(DIGIT(*p)){
    num = num * 10 + (*p++ - '0');
  }
  return true;
}

循环,首先是在前导空格上,然后是在数字上,应该尽可能快,当然要比400ns/字节小得多。

RickNotFred 14 年前

把两个大数字分开很难。也许一个改进是首先通过观察一些较小的素数来描述k。现在让我们说2、3和5。如果k可以被其中任何一个整除,那么inputnum也需要被整除,或者inputnum不能被k整除。当然还有更多的技巧(你可以使用bitwise and Of inputnum to 1来确定你是否可以被2整除),但我认为只要去掉低素数的可能性,速度就会得到合理的提高(无论如何值得一试)。