我将试着写一个简短的总结我所学到的,同时试图找出发生了什么。
这个答案是可能的感谢@Lawrence-感谢!
长话短说
std::string
.
glibc
的内存分配器-
.
MCVE
#include <thread>
#include <vector>
#include <chrono>
int main() {
std::vector<std::thread> workers;
for( unsigned i = 0; i < 192; ++i )
workers.emplace_back([]{
const auto x = std::make_unique<int>(rand());
while (true) std::this_thread::sleep_for(std::chrono::seconds(1));});
workers.back().join();
}
命令
编译:
g++ --std=c++14 -fno-inline -g3 -O0 -pthread test.cpp
.
valgrind --tool=massif --pages-as-heap=[no|yes] ./a.out
内存使用
top
显示
7'815'012
KiB虚拟内存。
pmap
还显示
7'815'016
KiB虚拟内存。
类似的结果如图所示
massif
具有
pages-as-heap=yes
:
7'817'088
另一方面,
地块
pages-as-heap=no
完全不同-大约133千磅!
页面为heap的Massif输出=是
100.00% (8,004,698,112B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->99.78% (7,986,741,248B) 0x54E0679: mmap (mmap.c:34)
| ->46.11% (3,690,987,520B) 0x545C3CF: new_heap (arena.c:438)
| | ->46.11% (3,690,987,520B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->46.11% (3,690,987,520B) 0x5463248: malloc (malloc.c:2911)
| | ->46.11% (3,690,987,520B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->46.11% (3,690,987,520B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
| | ->46.11% (3,690,987,520B) 0x400EC5: main::{lambda()
| | ->46.11% (3,690,987,520B) 0x40225C: void std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x402194: std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->46.11% (3,690,987,520B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->46.11% (3,690,987,520B) 0x54E63DB: clone (clone.S:109)
| |
| ->33.53% (2,684,354,560B) 0x545C35B: new_heap (arena.c:427)
| | ->33.53% (2,684,354,560B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->33.53% (2,684,354,560B) 0x5463248: malloc (malloc.c:2911)
| | ->33.53% (2,684,354,560B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->33.53% (2,684,354,560B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
| | ->33.53% (2,684,354,560B) 0x400EC5: main::{lambda()
| | ->33.53% (2,684,354,560B) 0x40225C: void std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x402194: std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->33.53% (2,684,354,560B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->33.53% (2,684,354,560B) 0x54E63DB: clone (clone.S:109)
| |
| ->20.13% (1,611,399,168B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
| ->20.13% (1,611,399,168B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->20.13% (1,611,399,168B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->20.13% (1,611,399,168B) 0x40139A: std::thread::thread<main::{lambda()
| ->20.13% (1,611,399,168B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->20.13% (1,611,399,168B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->19.19% (1,535,864,832B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->19.19% (1,535,864,832B) 0x400F47: main (test.cpp:10)
| |
| ->00.94% (75,534,336B) in 1+ places, all below ms_print's threshold (01.00%)
|
->00.22% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
页面为heap=no的Massif输出
终止程序前的内存使用情况:
--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
68 2,793,125 143,280 136,676 6,604 0
95.39% (136,676B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->50.74% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->50.74% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
| ->50.74% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
| ->50.74% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
|
->36.58% (52,416B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
| ->36.58% (52,416B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
| ->36.58% (52,416B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->36.58% (52,416B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->36.58% (52,416B) 0x40139A: std::thread::thread<main::{lambda()
| ->36.58% (52,416B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->36.58% (52,416B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->34.77% (49,824B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->34.77% (49,824B) 0x400F47: main (test.cpp:10)
| |
| ->01.81% (2,592B) 0x4010FF: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
| ->01.81% (2,592B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| ->01.81% (2,592B) 0x400F47: main (test.cpp:10)
|
->06.13% (8,784B) 0x401B4B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401A60: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40194D: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401894: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40183A: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x4017C7: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x4016AB: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40155E: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401374: std::thread::thread<main::{lambda()
| ->06.13% (8,784B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->06.13% (8,784B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->05.83% (8,352B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->05.83% (8,352B) 0x400F47: main (test.cpp:10)
| |
| ->00.30% (432B) in 1+ places, all below ms_print's threshold (01.00%)
|
->01.43% (2,048B) 0x403432: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->01.43% (2,048B) 0x4032CF: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
| ->01.43% (2,048B) 0x4030B8: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
| ->01.43% (2,048B) 0x4010B6: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
| ->01.43% (2,048B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| ->01.43% (2,048B) 0x400F47: main (test.cpp:10)
|
->00.51% (724B) in 1+ places, all below ms_print's threshold (01.00%)
什么怪胎发生了?
与
页面作为堆=否
事情看起来很合理,我们不要去检查它。一如所料,一切都以失败告终
malloc/new/new[]
而且内存使用量很小,我们不必担心——这些是高级分配。
pages as heap=是
但是你看
pages as heap=是
? ~8GiB虚拟内存用这个简单的代码?
pthread_create
让我们从简单的一个开始:那一个,以
.
地块
报告
1,611,399,168
which is the default max stack size of a thread in Linux
.
,即8'196 KiB并不完全是8 MiB(8'192 KiB)。我不知道这种差异从何而来,但目前并不明显。
std::make_unique<int>
好的,让我们看看另外两个堆栈。。。等等,它们完全一样?是 啊,
的文档解释了这一点,我不完全理解,但也不重要。它们显示完全相同的堆栈。让我们把结果合并起来,一起检查一下。
6'375'342'080
字节,它们都是由我们的简单
标准::使\u独一无二<内部>
让我们后退一步:如果我们运行相同的实验,但是使用一个简单的线程,我们将看到
int
分配原因分配
67'108'864
这一切都归结为执行
malloc
new/new[]
在内部实现
马洛克
.. 默认情况下)。
马洛克
在内部使用一个名为
ptmalloc2
简单地说,此分配器处理以下术语:
-
per thread arena
:巨大的内存区域;通常是每个线程,出于性能原因;不是所有的软件线程都有自己的线程
,这通常取决于硬件线程的数量(我猜还有其他细节);
-
heap
:的
arena
它们被分成一堆;
-
chunks
:的
大块
.
有很多关于这些事情的细节,稍后会发布一些有趣的链接,虽然这应该足够让读者自己做研究了,这些都是底层和深层的东西,与C++内存管理有关。
所以,让我们回到我们的测试中,使用一个线程为单个线程分配64个MiB
?? 让我们再次看到堆栈跟踪,并集中在其末尾:
mmap (mmap.c:34)
new_heap (arena.c:438)
arena_get2.part.3 (arena.c:646)
malloc (malloc.c:2911)
惊喜,惊喜:
马洛克
电话
arena_get2
,调用
new_heap
,这导致我们
mmap
(
mmap公司
brk
是底层系统调用,在Linux中用于内存分配)。据报道,这正好分配了64个MiB内存。
6'375'342'080
-这是
95*64 MiB!
如果必要的话,你可以挖得更深一些。
非常酷的解释文章:
Understanding glibc malloc
更正式的/官方文件:
The GNU allocator
冷堆栈交换问题:
How does glibc malloc works
其他:
如果在阅读这篇文章的时候,这些链接中的一些被破坏了,那么应该很容易找到类似的文章。这个话题很流行,如果你知道该找什么,怎么找的话。
我希望这些观察能很好地描述整个情况,也能为进一步的深入研究提供足够的素材。