代码之家  ›  专栏  ›  技术社区  ›  Kiril Kirov

理解Linux虚拟内存:valgrind的massif输出显示了有和没有的主要区别——页面作为堆

  •  9
  • Kiril Kirov  · 技术社区  · 6 年前

    7 GB 当它被禁用时,报告的用法是 160 KB

    top 还显示了大约7GB的容量 确认 pages-as-heap=yes

    (我有一个理论,但我不相信它能解释如此巨大的差异,所以-寻求帮助)。

    尤其让我困扰的是,报告的大部分内存使用都是由 std::string ,而 what? 从不打印(意味着实际容量非常小)。

    我确实需要使用 pages as heap=是


    代码段:

    #include <iostream>
    #include <thread>
    #include <vector>
    #include <chrono>
    
    void run()
    {
        while (true)
        {
            std::string s;
            s += "aaaaa";
            s += "aaaaaaaaaaaaaaa";
            s += "bbbbbbbbbb";
            s += "cccccccccccccccccccccccccccccccccccccccccccccccccc";
            if (s.capacity() > 1024) std::cout << "what?" << std::endl;
    
            std::this_thread::sleep_for(std::chrono::seconds(1));
        }
    }
    
    int main()
    {
        std::vector<std::thread> workers;
        for( unsigned i = 0; i < 192; ++i ) workers.push_back(std::thread(&run));
    
        workers.back().join();
    }
    

    编制单位: g++ --std=c++11 -fno-inline -g3 -pthread

    pages as heap=是 :

    100.00% (7,257,714,688B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
    ->99.75% (7,239,757,824B) 0x54E0679: mmap (mmap.c:34)
    | ->53.63% (3,892,314,112B) 0x545C3CF: new_heap (arena.c:438)
    | | ->53.63% (3,892,314,112B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
    | |   ->53.63% (3,892,314,112B) 0x5463248: malloc (malloc.c:2911)
    | |     ->53.63% (3,892,314,112B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |       ->53.63% (3,892,314,112B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |         ->53.63% (3,892,314,112B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |           ->53.63% (3,892,314,112B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |             ->53.63% (3,892,314,112B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |               ->53.63% (3,892,314,112B) 0x401252: run() (test.cpp:11)
    | |                 ->53.63% (3,892,314,112B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
    | |                   ->53.63% (3,892,314,112B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
    | |                     ->53.63% (3,892,314,112B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
    | |                       ->53.63% (3,892,314,112B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |                         ->53.63% (3,892,314,112B) 0x51C96B8: start_thread (pthread_create.c:333)
    | |                           ->53.63% (3,892,314,112B) 0x54E63DB: clone (clone.S:109)
    | |                             
    | ->35.14% (2,550,136,832B) 0x545C35B: new_heap (arena.c:427)
    | | ->35.14% (2,550,136,832B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
    | |   ->35.14% (2,550,136,832B) 0x5463248: malloc (malloc.c:2911)
    | |     ->35.14% (2,550,136,832B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |       ->35.14% (2,550,136,832B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |         ->35.14% (2,550,136,832B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |           ->35.14% (2,550,136,832B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |             ->35.14% (2,550,136,832B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |               ->35.14% (2,550,136,832B) 0x401252: run() (test.cpp:11)
    | |                 ->35.14% (2,550,136,832B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
    | |                   ->35.14% (2,550,136,832B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
    | |                     ->35.14% (2,550,136,832B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
    | |                       ->35.14% (2,550,136,832B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |                         ->35.14% (2,550,136,832B) 0x51C96B8: start_thread (pthread_create.c:333)
    | |                           ->35.14% (2,550,136,832B) 0x54E63DB: clone (clone.S:109)
    | |                             
    | ->10.99% (797,306,880B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
    |   ->10.99% (797,306,880B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |     ->10.99% (797,306,880B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       ->10.99% (797,306,880B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
    |         ->10.99% (797,306,880B) 0x401353: main (test.cpp:24)
    |           
    ->00.25% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
    

    pages-as-heap=no

    96.38% (159,289B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->43.99% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | ->43.99% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
    |   ->43.99% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
    |     ->43.99% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
    |       
    ->33.46% (55,296B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
    | ->33.46% (55,296B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
    |   ->33.46% (55,296B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |     ->33.46% (55,296B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       ->33.46% (55,296B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
    |         ->33.46% (55,296B) 0x401353: main (test.cpp:24)
    |           
    ->12.12% (20,025B) 0x4EFFE37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | ->12.12% (20,025B) 0x4F00C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |   ->12.12% (20,025B) 0x4F00D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |     ->12.12% (20,025B) 0x4F00FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       ->12.07% (19,950B) 0x401285: run() (test.cpp:14)
    |       | ->12.07% (19,950B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
    |       |   ->12.07% (19,950B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
    |       |     ->12.07% (19,950B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
    |       |       ->12.07% (19,950B) 0x4EE9C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       |         ->12.07% (19,950B) 0x53D06B8: start_thread (pthread_create.c:333)
    |       |           ->12.07% (19,950B) 0x56ED3DB: clone (clone.S:109)
    |       |             
    |       ->00.05% (75B) in 1+ places, all below ms_print's threshold (01.00%)
    |       
    ->05.58% (9,216B) 0x40315B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) (new_allocator.h:104)
    | ->05.58% (9,216B) 0x402FC2: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) (alloc_traits.h:488)
    |   ->05.58% (9,216B) 0x402D4B: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::thread::_Impl<std::_Bind_simple<void (*())()> >*, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:616)
    |     ->05.58% (9,216B) 0x402BDE: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:1090)
    |       ->05.58% (9,216B) 0x402A76: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > >::shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:316)
    |         ->05.58% (9,216B) 0x402771: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::allocate_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:594)
    |           ->05.58% (9,216B) 0x402325: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::make_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (shared_ptr.h:610)
    |             ->05.58% (9,216B) 0x401F9C: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::thread::_M_make_routine<std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (thread:196)
    |               ->05.58% (9,216B) 0x401BC4: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
    |                 ->05.58% (9,216B) 0x401353: main (test.cpp:24)
    |                   
    ->01.24% (2,048B) 0x402C9A: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
      ->01.24% (2,048B) 0x402AF5: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
        ->01.24% (2,048B) 0x402928: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
          ->01.24% (2,048B) 0x40244E: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<std::thread>(std::thread&&) (vector.tcc:412)
            ->01.24% (2,048B) 0x40206D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<std::thread>(std::thread&&) (vector.tcc:101)
              ->01.24% (2,048B) 0x401C82: std::vector<std::thread, std::allocator<std::thread> >::push_back(std::thread&&) (stl_vector.h:932)
                ->01.24% (2,048B) 0x401366: main (test.cpp:24)
    

    请忽略线程的糟糕处理,这只是一个非常简短的示例。


    更新

    看来,这与 标准::字符串 完全。正如@Lawrence所建议的,只需分配一个 int 堆上(用 new ). 我相信@Lawrence在这里非常接近真实的答案,引用了他的评论(对于更多的读者来说更容易):

    @KirilKirov字符串分配实际上并没有占用那么多 空间。。。每个线程得到它的初始堆栈,然后堆访问映射 反射。只需声明一个字符串,然后 有一个旋转环。。。显示相同的虚拟内存使用情况 劳伦斯9月28日14:51

    我:

    @劳伦斯-你说得对!好吧,那么,你是说 在每个线程上,在第一次堆分配时, 内存管理器(或者操作系统,或者其他什么)会占用大量的内存 线程堆所需的内存?这个块将被重用 以后(或缩小,如有必要)?基里尔基洛夫9月28日15:45

    劳伦斯:

    @基里洛夫那种性质的东西。。。确切的分配可能取决于malloc实现以及2天前的Lawrence

    4 回复  |  直到 6 年前
        1
  •  4
  •   Lawrence    6 年前

    massif 具有 --pages-as-heap=yes top 您正在观察的列都测量一个进程使用的虚拟内存。此虚拟内存包含所有空间 mmap 在malloc的实现和线程的创建过程中。例如,线程的默认堆栈大小为 8192k 这反映在每个线程的创建中,并有助于虚拟内存占用。

    mmap公司 大约65兆字节的空间。这可以通过查看 pmap

    摘录自与示例非常相似的程序:

    75170:   ./a.out
    0000000000400000     24K r-x-- a.out
    0000000000605000      4K r---- a.out
    0000000000606000      4K rw--- a.out
    0000000001b6a000    200K rw---   [ anon ]
    00007f669dfa4000      4K -----   [ anon ]
    00007f669dfa5000   8192K rw---   [ anon ]
    00007f669e7a5000      4K -----   [ anon ]
    00007f669e7a6000   8192K rw---   [ anon ]
    00007f669efa6000      4K -----   [ anon ]
    00007f669efa7000   8192K rw---   [ anon ]
    ...
    00007f66cb800000   8192K rw---   [ anon ]
    00007f66cc000000    132K rw---   [ anon ]
    00007f66cc021000  65404K -----   [ anon ]
    00007f66d0000000    132K rw---   [ anon ]
    00007f66d0021000  65404K -----   [ anon ]
    00007f66d4000000    132K rw---   [ anon ]
    00007f66d4021000  65404K -----   [ anon ]
    ...
    00007f6880586000   8192K rw---   [ anon ]
    00007f6880d86000   1056K r-x-- libm-2.23.so
    00007f6880e8e000   2044K ----- libm-2.23.so
    ...
    00007f6881c08000      4K r---- libpthread-2.23.so
    00007f6881c09000      4K rw--- libpthread-2.23.so
    00007f6881c0a000     16K rw---   [ anon ]
    00007f6881c0e000    152K r-x-- ld-2.23.so
    00007f6881e09000     24K rw---   [ anon ]
    00007f6881e33000      4K r---- ld-2.23.so
    00007f6881e34000      4K rw--- ld-2.23.so
    00007f6881e35000      4K rw---   [ anon ]
    00007ffe9d75b000    132K rw---   [ stack ]
    00007ffe9d7f8000     12K r----   [ anon ]
    00007ffe9d7fb000      8K r-x--   [ anon ]
    ffffffffff600000      4K r-x--   [ anon ]
     total          7815008K
    

        2
  •  6
  •   Kiril Kirov    6 年前

    我将试着写一个简短的总结我所学到的,同时试图找出发生了什么。
    这个答案是可能的感谢@Lawrence-感谢!


    长话短说

    std::string .
    glibc 的内存分配器- .


    MCVE

    #include <thread>
    #include <vector>
    #include <chrono>
    
    int main() {
        std::vector<std::thread> workers;
        for( unsigned i = 0; i < 192; ++i )
            workers.emplace_back([]{
                const auto x = std::make_unique<int>(rand());
                while (true) std::this_thread::sleep_for(std::chrono::seconds(1));});
        workers.back().join();
    }
    

    命令

    编译: g++ --std=c++14 -fno-inline -g3 -O0 -pthread test.cpp .
    valgrind --tool=massif --pages-as-heap=[no|yes] ./a.out

    内存使用

    top 显示 7'815'012 KiB虚拟内存。
    pmap 还显示 7'815'016 KiB虚拟内存。
    类似的结果如图所示 massif 具有 pages-as-heap=yes : 7'817'088
    另一方面, 地块 pages-as-heap=no 完全不同-大约133千磅!

    页面为heap的Massif输出=是

    100.00% (8,004,698,112B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
    ->99.78% (7,986,741,248B) 0x54E0679: mmap (mmap.c:34)
    | ->46.11% (3,690,987,520B) 0x545C3CF: new_heap (arena.c:438)
    | | ->46.11% (3,690,987,520B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
    | |   ->46.11% (3,690,987,520B) 0x5463248: malloc (malloc.c:2911)
    | |     ->46.11% (3,690,987,520B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |       ->46.11% (3,690,987,520B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
    | |         ->46.11% (3,690,987,520B) 0x400EC5: main::{lambda()
    | |           ->46.11% (3,690,987,520B) 0x40225C: void std::_Bind_simple<main::{lambda()
    | |             ->46.11% (3,690,987,520B) 0x402194: std::_Bind_simple<main::{lambda()
    | |               ->46.11% (3,690,987,520B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
    | |                 ->46.11% (3,690,987,520B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |                   ->46.11% (3,690,987,520B) 0x51C96B8: start_thread (pthread_create.c:333)
    | |                     ->46.11% (3,690,987,520B) 0x54E63DB: clone (clone.S:109)
    | |                       
    | ->33.53% (2,684,354,560B) 0x545C35B: new_heap (arena.c:427)
    | | ->33.53% (2,684,354,560B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
    | |   ->33.53% (2,684,354,560B) 0x5463248: malloc (malloc.c:2911)
    | |     ->33.53% (2,684,354,560B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |       ->33.53% (2,684,354,560B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
    | |         ->33.53% (2,684,354,560B) 0x400EC5: main::{lambda()
    | |           ->33.53% (2,684,354,560B) 0x40225C: void std::_Bind_simple<main::{lambda()
    | |             ->33.53% (2,684,354,560B) 0x402194: std::_Bind_simple<main::{lambda()
    | |               ->33.53% (2,684,354,560B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
    | |                 ->33.53% (2,684,354,560B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | |                   ->33.53% (2,684,354,560B) 0x51C96B8: start_thread (pthread_create.c:333)
    | |                     ->33.53% (2,684,354,560B) 0x54E63DB: clone (clone.S:109)
    | |                       
    | ->20.13% (1,611,399,168B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
    |   ->20.13% (1,611,399,168B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |     ->20.13% (1,611,399,168B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       ->20.13% (1,611,399,168B) 0x40139A: std::thread::thread<main::{lambda()
    |         ->20.13% (1,611,399,168B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
    |           ->20.13% (1,611,399,168B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
    |             ->19.19% (1,535,864,832B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
    |             | ->19.19% (1,535,864,832B) 0x400F47: main (test.cpp:10)
    |             |   
    |             ->00.94% (75,534,336B) in 1+ places, all below ms_print's threshold (01.00%)
    |             
    ->00.22% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
    

    页面为heap=no的Massif输出

    终止程序前的内存使用情况:

    --------------------------------------------------------------------------------
      n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
    --------------------------------------------------------------------------------
     68      2,793,125          143,280          136,676         6,604            0
    95.39% (136,676B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->50.74% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    | ->50.74% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
    |   ->50.74% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
    |     ->50.74% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
    |       
    ->36.58% (52,416B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
    | ->36.58% (52,416B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
    |   ->36.58% (52,416B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |     ->36.58% (52,416B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
    |       ->36.58% (52,416B) 0x40139A: std::thread::thread<main::{lambda()
    |         ->36.58% (52,416B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
    |           ->36.58% (52,416B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
    |             ->34.77% (49,824B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
    |             | ->34.77% (49,824B) 0x400F47: main (test.cpp:10)
    |             |   
    |             ->01.81% (2,592B) 0x4010FF: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
    |               ->01.81% (2,592B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
    |                 ->01.81% (2,592B) 0x400F47: main (test.cpp:10)
    |                   
    ->06.13% (8,784B) 0x401B4B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    | ->06.13% (8,784B) 0x401A60: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |   ->06.13% (8,784B) 0x40194D: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |     ->06.13% (8,784B) 0x401894: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |       ->06.13% (8,784B) 0x40183A: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |         ->06.13% (8,784B) 0x4017C7: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |           ->06.13% (8,784B) 0x4016AB: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |             ->06.13% (8,784B) 0x40155E: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
    |               ->06.13% (8,784B) 0x401374: std::thread::thread<main::{lambda()
    |                 ->06.13% (8,784B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
    |                   ->06.13% (8,784B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
    |                     ->05.83% (8,352B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
    |                     | ->05.83% (8,352B) 0x400F47: main (test.cpp:10)
    |                     |   
    |                     ->00.30% (432B) in 1+ places, all below ms_print's threshold (01.00%)
    |                     
    ->01.43% (2,048B) 0x403432: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
    | ->01.43% (2,048B) 0x4032CF: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
    |   ->01.43% (2,048B) 0x4030B8: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
    |     ->01.43% (2,048B) 0x4010B6: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
    |       ->01.43% (2,048B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
    |         ->01.43% (2,048B) 0x400F47: main (test.cpp:10)
    |           
    ->00.51% (724B) in 1+ places, all below ms_print's threshold (01.00%)
    

    什么怪胎发生了?

    页面作为堆=否 事情看起来很合理,我们不要去检查它。一如所料,一切都以失败告终 malloc/new/new[] 而且内存使用量很小,我们不必担心——这些是高级分配。

    pages as heap=是

    但是你看 pages as heap=是 ? ~8GiB虚拟内存用这个简单的代码?

    pthread_create

    让我们从简单的一个开始:那一个,以 .

    地块 报告 1,611,399,168 which is the default max stack size of a thread in Linux .

    ,即8'196 KiB并不完全是8 MiB(8'192 KiB)。我不知道这种差异从何而来,但目前并不明显。

    std::make_unique<int>

    好的,让我们看看另外两个堆栈。。。等等,它们完全一样?是 啊, 的文档解释了这一点,我不完全理解,但也不重要。它们显示完全相同的堆栈。让我们把结果合并起来,一起检查一下。

    6'375'342'080 字节,它们都是由我们的简单 标准::使\u独一无二<内部>

    让我们后退一步:如果我们运行相同的实验,但是使用一个简单的线程,我们将看到 int 分配原因分配 67'108'864

    这一切都归结为执行 malloc new/new[] 在内部实现 马洛克 .. 默认情况下)。

    马洛克 在内部使用一个名为 ptmalloc2

    简单地说,此分配器处理以下术语:

    • per thread arena :巨大的内存区域;通常是每个线程,出于性能原因;不是所有的软件线程都有自己的线程 ,这通常取决于硬件线程的数量(我猜还有其他细节);
    • heap :的 arena 它们被分成一堆;
    • chunks :的 大块 .

    有很多关于这些事情的细节,稍后会发布一些有趣的链接,虽然这应该足够让读者自己做研究了,这些都是底层和深层的东西,与C++内存管理有关。

    所以,让我们回到我们的测试中,使用一个线程为单个线程分配64个MiB ?? 让我们再次看到堆栈跟踪,并集中在其末尾:

    mmap (mmap.c:34)
    new_heap (arena.c:438)
    arena_get2.part.3 (arena.c:646)
    malloc (malloc.c:2911)
    

    惊喜,惊喜: 马洛克 电话 arena_get2 ,调用 new_heap ,这导致我们 mmap ( mmap公司 brk 是底层系统调用,在Linux中用于内存分配)。据报道,这正好分配了64个MiB内存。

    6'375'342'080 -这是 95*64 MiB!

    如果必要的话,你可以挖得更深一些。

    非常酷的解释文章: Understanding glibc malloc

    更正式的/官方文件: The GNU allocator

    冷堆栈交换问题: How does glibc malloc works

    其他:

    如果在阅读这篇文章的时候,这些链接中的一些被破坏了,那么应该很容易找到类似的文章。这个话题很流行,如果你知道该找什么,怎么找的话。

    我希望这些观察能很好地描述整个情况,也能为进一步的深入研究提供足够的素材。

        3
  •  0
  •   Paul Floyd    6 年前

    这只是一个“类”的答案(从瓦尔格林的角度)。内存池的问题,特别是C++字符串,已经有一段时间了。这个 Valgrind manual 有一个关于C++字符串泄漏的章节,建议您尝试设置GLYBCXXXFROUTION新环境变量。

    此外,对于GCC6和更高版本,Valgrind在libstdc++中添加了钩子来清理仍然可以访问的内存。Valgrind bugzilla入口是 here here .

    我不明白为什么这么小的分配会膨胀到这么多千兆字节(对于64位可执行文件centos6.6和gcc6.2来说超过12gb)。

        4
  •  0
  •   user3344003    6 年前

    --pages as heap=[默认值:否] 告诉Massif在页面级别而不是malloc的块级别分析内存。详见上文。

    如果是,您正在测量页数。 如果没有,则测量malloc块。