代码之家 › 专栏 › 技术社区 › Kasper Holdum

类与纯数组表示

code-design arrays performance c#

Kasper Holdum · 技术社区 · 15 年前

我们需要在应用程序中表示大量数据。我们使用整数数组来实现这一点。最终产品的性能应达到最大。我们正在考虑将数组封装到类中,以便添加与数组相关的属性,例如 IS-负的 , 数字库 一模一样。

但是,我们担心使用类会扼杀我们的性能。我们做了一个测试,在这里我们创建了固定数量的数组,并通过纯数组的用法设置它的值,在这里创建了一个类,并且通过类访问数组:

for (int i = 0; i < 10000; i++)
{
    if (createClass)
    {
        BigNumber b = new BigNumber(new int[5000], 10);
        for (int j = 0; j < b.Number.Length; j++)
        {
            b[j] = 5;
        }
    }
    else
    {
        int[] test = new int[5000];
        for (int j = 0; j < test.Length; j++)
        {
            test[j] = 5;
        }
    }
}

而且,似乎使用类会将上述代码的运行时间降低大约6倍。我们只是通过将数组封装在一个结构中来尝试上述方法,这使得运行时间几乎等于纯数组使用。

与结构相比,使用类时,是什么导致了如此巨大的开销?当使用堆栈而不是堆时,这真的只是您获得的性能提升吗?

编辑: BigNumber只将数组存储在由属性公开的私有变量中。简化:

public class BigNumber{
  private int[] number;

  public BigNumber(int[] number) { this.number = number;}

  public int[] Number{get{return number;}}
}

8 回复 | 直到 15 年前

Guffa 15 年前

第二个循环比第一个循环快得多,这并不奇怪。发生的并不是类的速度异常缓慢,而是编译器很容易优化循环。

由于循环的范围是从0到test.length-1,因此编译器可以告诉索引变量永远不能在数组之外,因此它可以在通过索引访问数组时删除范围检查。

在第一个循环中,编译器不能在循环和数组之间进行连接,因此它必须根据访问的每个项的边界检查索引。

在类中封装数组时,总会有一些开销,但这并不像在测试中得到的那样多。您选择了一个编译器能够很好地优化纯数组访问的位置,所以您测试的更多是编译器优化代码的能力,而不是您要测试的内容。

Tim 15 年前

您应该在运行代码时对其进行概要分析,并查看花费的时间。

还可以考虑另一种语言,它使使用大整数变得容易。

inked 15 年前

您正在使用整数数据类型来存储一个数字,这是一个真正大数字的一部分?你说什么?

我认为你缺乏对计算机科学的基本理解。

数字0-9可以用4位表示。一个字节包含8位。所以,您可以将两个数字填充到一个字节中(这里有您的第一个加速提示)。

现在,查看一个整数占用了多少字节(提示:它将远远超过存储一个数字所需的字节数)。

杀伤力的是整数的使用,它消耗的内存是你应该消耗的4倍。使用字节或(最坏情况下)字符数组(每个字节或字符2位)来存储数字。将数字“打包”并“解包”成一个字节并不需要很多逻辑。

Henk Holterman 15 年前

从表面上看,我不希望有什么大的不同。当然不是因素6。BigNumber只是一个围绕 int[] 不是吗?如果你从BigNumber给我们看一点会有帮助的。检查你的基准。

如果你张贴一些小的东西,我们可以复制/粘贴并运行,这将是理想的选择。

Reed Copsey 15 年前

如果没有看到BigInteger的实现,就很难判断。不过,我有两个猜测。

1)通过数组测试,您的行可以通过JIT获得特殊的处理,从而消除数组边界检查。这可以给你很大的提升,尤其是因为你没有在循环中做任何“真正的工作”

for (int j = 0; j < test.Length; j++) // This removes bounds checking by JIT

2)您是否在Visual Studio之外的发布模式下对此进行计时?如果不是这样的话,这就解释了6倍的速度下降,因为Visual Studio宿主进程会人为地降低类访问速度。确保您处于释放模式,使用ctrl+f5测试计时。

Community CDub 7 年前

与其重新设计(调试和完善)轮子,不如使用现有的大整数实现更好地为您服务,这样您就可以继续进行项目的其余部分。

This SO topic 是个好的开始。
你也可以退房 this CodeProject article .

Paul 15 年前

正如Guffa所指出的,差异主要是边界检查。

为了确保边界检查不会破坏性能,您还可以在 unsafe 块,这将消除边界检查。为此,您需要使用/unsafe选项进行编译。

inked 15 年前

    //pre-load the bits -- do this only ONCE
    byte[] baHi = new byte[16];
    baHi[0]=0;
    baHi[1] = 000 + 00 + 00 + 16; //0001
    baHi[2] = 000 + 00 + 32 + 00; //0010
    baHi[3] = 000 + 00 + 32 + 16; //0011
    baHi[4] = 000 + 64 + 00 + 00; //0100
    baHi[5] = 000 + 64 + 00 + 16; //0101
    baHi[6] = 000 + 64 + 32 + 00; //0110
    baHi[7] = 000 + 64 + 32 + 16; //0111
    baHi[8] = 128 + 00 + 00 + 00; //1000
    baHi[9] = 128 + 00 + 00 + 16; //1001
    //not needed for 0-9
    //baHi[10] = 128 + 00 + 32 + 00; //1010
    //baHi[11] = 128 + 00 + 32 + 16; //1011
    //baHi[12] = 128 + 64 + 00 + 00; //1100
    //baHi[13] = 128 + 64 + 00 + 16; //1101
    //baHi[14] = 128 + 64 + 32 + 00; //1110
    //baHi[15] = 128 + 64 + 32 + 16; //1111

    //-------------------------------------------------------------------------
    //START PACKING
    //load TWO digits (0-9) at a time
    //this means if you're loading a big number from
    //a file, you read two digits at a time
    //and put them into bLoVal and bHiVal
    //230942034371231235  see that '37' in the middle?
    //         ^^
    //

    byte bHiVal = 3; //0000 0011 
    byte bLoVal = 7; //0000 1011 

    byte bShiftedLeftHiVal = (byte)baHi[bHiVal]; //0011 0000  =3, shifted (48)    

    //fuse the two together into a single byte
    byte bNewVal = (byte)(bShiftedLeftHiVal + bLoVal); //0011 1011 = 55 decimal

    //now store bNewVal wherever you want to store it
    //for later retrieval, like a byte array
    //END PACKING

    //-------------------------------------------------------------------------


    Response.Write("PACKING: hi: " + bHiVal + " lo: " + bLoVal + " packed: " + bNewVal);
    Response.Write("<br>");

    //-------------------------------------------------------------------------
    //START UNPACKING
    byte bUnpackedLoByte = (byte)(bNewVal & 15);  //will yield 7
    byte bUnpackedHiByte = (byte)(bNewVal & 240); //will yield 48

    //now we need to change '48' back into '3'
    string sHiBits = "00000000" + Convert.ToString(bUnpackedHiByte, 2); //drops leading 0s, so we pad...
    sHiBits = sHiBits.Substring(sHiBits.Length - 8, 8);                 //and get the last 8 characters
    sHiBits = ("0000" + sHiBits).Substring(0, 8);                       //shift right
    bUnpackedHiByte = (byte)Convert.ToSByte(sHiBits, 2);                //and, finally, get back the original byte

    //the above method, reworked, could also be used to PACK the data,
    //though it might be slower than hitting an array.
    //You can also loop through baHi to unpack, comparing the original
    //bUnpackedHyByte to the contents of the array and return
    //the index of where you found it (the index would be the 
    //unpacked digit)

    Response.Write("UNPACKING: input: " + bNewVal + " hi: " + bUnpackedHiByte + " lo: " + bUnpackedLoByte);

    //now create your output with bUnpackedHiByte and bUnpackedLoByte,
    //then move on to the next two bytes in where ever you stored the
    //really big number
    //END UNPACKING
    //-------------------------------------------------------------------------

即使你在原来的解决方案中把int改为short,你的内存需求也会减少一半,上面的内容会把内存减少到几乎一个最小值(我敢肯定会有人对浪费的字节大喊大叫)。