在生产环境中,我的多线程MFC应用程序有时会挂起。
我们从一个又一个任务管理器中强制转储。
在第一次转储中,使用windbg和/或visual studio,在主线程上我有如下内容:
. 0 Id: af4.af8 Suspend: 0 Teb: 000007ff`fffde000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`0102e368 00000000`7740ac38 ntdll!ZwWaitForSingleObject+0xa
01 00000000`0102e370 00000000`7740ab34 ntdll!RtlpWaitOnCriticalSection+0xe8
02 00000000`0102e420 00000000`77413165 ntdll!RtlEnterCriticalSection+0xd1
03 00000000`0102e450 00000000`77412e53 ntdll!RtlpReferenceCurrentDirectory+0x25
04 00000000`0102e4f0 00000000`77412478 ntdll!RtlGetFullPathName_Ustr+0xa56
05 00000000`0102e640 00000000`7741269a ntdll!RtlpDosPathNameToRelativeNtPathName_Ustr+0xd8
06 00000000`0102e910 000007fe`fd427acd ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x6a
07 00000000`0102e960 00000001`411569ee KERNELBASE!GetFileAttributesExW+0x2d
08 00000000`0102ea30 00000001`41156995 rle!_waccess_s+0x4e [minkernel\crts\ucrt\src\appcrt\filesystem\waccess.cpp @ 23]
09 00000000`0102ea90 00000001`4039fb30 rle!_waccess+0x9 [minkernel\crts\ucrt\src\appcrt\filesystem\waccess.cpp @ 54]
0a 00000000`0102eac0 00000001`403a0e99 rle!WriteStringOnLogFile+0x150 [C:\usr\work\hq\COMMON_APP.CPP @ 3300]
0b 00000000`0102f070 00000001`401e4107 rle!WriteStringOnLogFile+0xc9 [C:\usr\work\hq\COMMON_APP.CPP @ 3253]
0c 00000000`0102f0d0 00000001`404e7d8c rle!CBlisterView::LavoroDopoStopMacchina+0x937 [C:\usr\work\hq\BLISTVW.CPP @ 15185]
我认为,另一个线程是由于主show me的锁:
14 Id: af4.bd8 Suspend: 1 Teb: 000007ff`fff94000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`4475f1c0 00000000`77412e53 ntdll!RtlpReferenceCurrentDirectory+0x40
01 00000000`4475f260 00000000`77412478 ntdll!RtlGetFullPathName_Ustr+0xa56
02 00000000`4475f3b0 00000000`7741269a ntdll!RtlpDosPathNameToRelativeNtPathName_Ustr+0xd8
03 00000000`4475f680 000007fe`fd424407 ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x6a
04 00000000`4475f6d0 00000000`771b0dad KERNELBASE!CreateFileW+0xa7
05 00000000`4475f830 00000001`4115e14d kernel32!CreateFileWImplementation+0x7d
06 00000000`4475f890 00000001`40548fb6 rle!common_stat<_stat64i32>+0x75 [minkernel\crts\ucrt\src\appcrt\filesystem\stat.cpp @ 463]
07 00000000`4475f8f0 00000001`4054961f rle!CFileCopyThread::CopiaTuttoInRemoto+0x216 [C:\usr\work\hq\rle\sources\LAVORO.CPP @ 12137]
请注意,在进入临界区之前,函数是如何相同的。
其他线程按预期位于WaitForMultipleObjects中。
另一个转储的情况类似,但在主线程中同一函数的不同点:
. 0 Id: a80.a84 Suspend: 0 Teb: 000007ff`fffde000 Unfrozen
# Child-SP RetAddr Call Site
00 00000000`0102e5a8 00000000`770e8156 ntdll!NtWaitForKeyedEvent+0xa
01 00000000`0102e5b0 000007fe`fced722d ntdll!RtlAcquireSRWLockExclusive+0x106
02 00000000`0102e610 000007fe`fced75fc KERNELBASE!GetTimeZoneInformationCacheForYear+0x14b
03 00000000`0102e6a0 00000001`40f4380d KERNELBASE!SystemTimeToTzSpecificLocalTime+0x48
04 00000000`0102e890 00000001`40f43e06 rle!ATL::CTime::CTime+0x51 [d:\a01\_work\6\s\src\vctools\vc7libs\ship\atlmfc\include\atltime.h @ 508]
05 00000000`0102e910 00000001`40f43bb0 rle!CFile::GetStatus+0x122 [d:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\filest.cpp @ 135]
06 00000000`0102e9c0 00000001`40f2a00f rle!CFile::GetFilePath+0x3c [d:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\filest.cpp @ 79]
07 00000000`0102ec40 00000001`401d68b8 rle!CArchive::CArchive+0x67 [d:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\arccore.cpp @ 248]
08 00000000`0102eca0 00000001`403e3980 rle!CBlisterDoc::OnSaveDocument+0x1f8 [C:\usr\work\hq\rle\sources\BLISTDOC.cpp @ 335]
09 00000000`0102ede0 00000001`401e4644 rle!CBlisterView::SaveDocument+0xc70 [C:\usr\work\hq\rle\sources\COMMON_VIEW.CPP @ 4498]
0a 00000000`0102f0d0 00000001`404e7d8c rle!CBlisterView::LavoroDopoStopMacchina+0xe74 [C:\usr\work\hq\rle\sources\BLISTVW.CPP @ 15270]
从我的理解来看,底层核心功能正在等待一些东西,但我不明白为什么。
这种问题会在内存损坏后随机发生吗?
我怎样才能更多地了解此类问题,并了解此类问题的根本原因?
有什么工具或技巧可以让我更好地理解吗?
提前谢谢大家。