代码之家  ›  专栏  ›  技术社区  ›  Dusol

使用多核时是否可能超出带宽限制?

  •  0
  • Dusol  · 技术社区  · 6 年前

    我正在使用fio基准测试工具测试带宽。 这是我的硬件规格

    1. 每10个孔2个插座
    2. 内核版本:4.8.17
    3. intel SSD 750系列

    cpu:Intel(R)Xeon(R)cpu E5-2650 v3@2.3GHZ,ssd:Intel固态驱动器750系列,400GB,20nm Intel NAND闪存MLC。NVMe PCIe 3.0*4扩充卡。

    在创建fio文件时,我可以在启动I/O之前使要使用的文件的缓冲区/页缓存部分无效。

    我使用O\U DIRECT flag(非缓冲IO)绕过页面缓存,并使用linux本机异步i/O请求。

    当我用一个内核进行测试时,fio输出表明 core0接收的带宽为1516.7MB/s。

    它没有超过intel SSD 750的带宽限制。没关系。

    下面是test1代码。

    [global]
    filename=/dev/nvme0n1
    runtime=10
    bs=4k
    ioengine=libaio
    direct=1
    iodepth=64
    invalidate=1 
    randrepeat=0
    log_avg_msec=1000
    time_based
    thread=1
    size=256m
    
    
    [job1]
    cpus_allowed=0
    rw=randread
    

    但是,当我用3cores这样做时,核心的总带宽超过了 intel SSD 750带宽限制。

    3cores的总带宽约为3000MB/s。

    根据intel SSD 750规范,我的intel SSD带宽限制为2200MB/s。

    下面是test2的代码(3核)

    [global]
    filename=/dev/nvme0n1
    runtime=10
    bs=4k
    ioengine=libaio
    direct=1
    iodepth=64
    
    invalidate=1 
    randrepeat=0
    log_avg_msec=1000
    time_based
    thread=1
    size=256m
    
    
    [job1]
    cpus_allowed=0
    rw=randread
    
    [job2]
    cpus_allowed=1
    rw=randread
    
    [job3]
    cpus_allowed=2
    rw=randread
    

    我不知道这是怎么发生的。


    下面是test1的fio测试输出

    job1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
    fio-2.2.10
    Starting 1 thread
    
    job1: (groupid=0, jobs=1): err= 0: pid=6924: Mon Jan 29 20:14:33 2018
      read : io=15139MB, bw=1513.8MB/s, iops=387516, runt= 10001msec
        slat (usec): min=0, max=42, avg= 1.97, stdev= 1.12
        clat (usec): min=5, max=1072, avg=162.70, stdev=20.17
         lat (usec): min=6, max=1073, avg=164.74, stdev=20.39
        clat percentiles (usec):
         |  1.00th=[  141],  5.00th=[  145], 10.00th=[  149], 20.00th=[  151],
         | 30.00th=[  155], 40.00th=[  157], 50.00th=[  159], 60.00th=[  161],
         | 70.00th=[  165], 80.00th=[  169], 90.00th=[  179], 95.00th=[  211],
         | 99.00th=[  229], 99.50th=[  262], 99.90th=[  318], 99.95th=[  318],
         | 99.99th=[  334]
        lat (usec) : 10=0.01%, 20=0.01%, 50=0.02%, 100=0.03%, 250=99.35%
        lat (usec) : 500=0.60%, 1000=0.01%
        lat (msec) : 2=0.01%
      cpu          : usr=22.32%, sys=77.64%, ctx=102, majf=0, minf=421
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
         issued    : total=r=3875556/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=64
    
    Run status group 0 (all jobs):
       READ: io=15139MB, aggrb=1513.8MB/s, minb=1513.8MB/s, maxb=1513.8MB/s, mint=10001msec, maxt=10001msec
    
    Disk stats (read/write):
      nvme0n1: ios=3834624/0, merge=0/0, ticks=25164/0, in_queue=25184, util=99.61% 
    

    这是test2(3个孔)的fio输出

    job1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
    
    job2: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
    
    job3: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
    
    fio-2.2.10
    Starting 3 threads
    
    job1: (groupid=0, jobs=1): err= 0: pid=6968: Mon Jan 29 20:14:53 2018
      read : io=10212MB, bw=1021.2MB/s, iops=261413, runt= 10001msec
        slat (usec): min=1, max=140, avg= 2.49, stdev= 1.23
        clat (usec): min=4, max=970, avg=241.78, stdev=138.10
         lat (usec): min=7, max=972, avg=244.35, stdev=138.09
        clat percentiles (usec):
         |  1.00th=[   17],  5.00th=[   25], 10.00th=[   33], 20.00th=[   64],
         | 30.00th=[  135], 40.00th=[  225], 50.00th=[  306], 60.00th=[  330],
         | 70.00th=[  346], 80.00th=[  366], 90.00th=[  390], 95.00th=[  410],
         | 99.00th=[  438], 99.50th=[  446], 99.90th=[  474], 99.95th=[  502],
         | 99.99th=[  668]
        lat (usec) : 10=0.01%, 20=2.03%, 50=14.39%, 100=9.67%, 250=16.14%
        lat (usec) : 500=57.71%, 750=0.05%, 1000=0.01%
      cpu          : usr=17.32%, sys=71.84%, ctx=182182, majf=0, minf=318
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
         issued    : total=r=2614396/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=64
    job2: (groupid=0, jobs=1): err= 0: pid=6969: Mon Jan 29 20:14:53 2018
      read : io=10540MB, bw=1053.1MB/s, iops=269802, runt= 10001msec
        slat (usec): min=1, max=35, avg= 1.93, stdev= 0.97
        clat (usec): min=5, max=903, avg=234.55, stdev=139.14
         lat (usec): min=7, max=904, avg=236.56, stdev=139.13
        clat percentiles (usec):
         |  1.00th=[   16],  5.00th=[   22], 10.00th=[   30], 20.00th=[   57],
         | 30.00th=[  112], 40.00th=[  207], 50.00th=[  298], 60.00th=[  330],
         | 70.00th=[  346], 80.00th=[  362], 90.00th=[  386], 95.00th=[  402],
         | 99.00th=[  426], 99.50th=[  438], 99.90th=[  462], 99.95th=[  494],
         | 99.99th=[  628]
        lat (usec) : 10=0.01%, 20=3.22%, 50=14.51%, 100=10.76%, 250=15.48%
        lat (usec) : 500=55.97%, 750=0.05%, 1000=0.01%
      cpu          : usr=26.08%, sys=59.08%, ctx=377522, majf=0, minf=326
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
         issued    : total=r=2698293/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=64
    job3: (groupid=0, jobs=1): err= 0: pid=6970: Mon Jan 29 20:14:53 2018
      read : io=10368MB, bw=1036.8MB/s, iops=265406, runt= 10001msec
        slat (usec): min=1, max=102, avg= 2.48, stdev= 1.24
        clat (usec): min=5, max=874, avg=238.10, stdev=139.10
         lat (usec): min=7, max=877, avg=240.66, stdev=139.09
        clat percentiles (usec):
         |  1.00th=[   18],  5.00th=[   27], 10.00th=[   39], 20.00th=[   72],
         | 30.00th=[  113], 40.00th=[  193], 50.00th=[  290], 60.00th=[  330],
         | 70.00th=[  350], 80.00th=[  370], 90.00th=[  398], 95.00th=[  414],
         | 99.00th=[  442], 99.50th=[  454], 99.90th=[  474], 99.95th=[  498],
         | 99.99th=[  628]
        lat (usec) : 10=0.01%, 20=1.51%, 50=12.00%, 100=13.78%, 250=17.81%
        lat (usec) : 500=54.84%, 750=0.05%, 1000=0.01%
      cpu          : usr=17.96%, sys=71.88%, ctx=170809, majf=0, minf=319
      IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
         issued    : total=r=2654335/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=64
    
    Run status group 0 (all jobs):
       READ: io=31121MB, aggrb=3111.9MB/s, minb=1021.2MB/s, maxb=1053.1MB/s, mint=10001msec, maxt=10001msec
    
    Disk stats (read/write):
      nvme0n1: ios=7883218/0, merge=0/0, ticks=1730536/0, in_queue=1763060, util=99.52%
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   Peter Cordes Steve Bohrer    6 年前

    隐马尔可夫模型。。。

    @PeterCordes对(设备)缓存提出了很好的观点。进行谷歌搜索返回 https://www.techspot.com/review/984-intel-ssd-750-series/ 上面写着:

    板上还有5个微米D9PQL DRAM芯片,用作1.25GB缓存,规格上说这是DDR3-1600内存。

    考虑到您将fio限制为在所有线程的相同256MB区域中工作,因此您的所有I/O都可以轻松地放入设备的缓存中。没有专门的方法可以丢弃 设备的 缓存(与Linux的缓冲区缓存不同)不是自然方式,因此我建议将您的工作区域显著增大(例如10s-100s GB),以减少线程的数据被另一个线程的访问预取的可能性。

    此外,我会问“在回读之前,您在SSD上放了哪些数据”?SSD通常是“薄”的,因为它们可以知道从未写入的区域,或者知道某个区域已被明确丢弃。因为从这些区域读取数据意味着SSD几乎没有什么工作要做,并且可以非常快地返回数据(就像操作系统从稀疏文件中的孔读取数据一样)。在“现实生活”中,你很少选择读一些你从未写过的东西,所以这样做会扭曲你的结果。