程序內(nèi)存一直在泄漏,原來是異步死循環(huán)了 !
一、背景
1. 講故事
上個(gè)月有位朋友找到我,說他的程序出現(xiàn)了內(nèi)存泄漏,不知道如何進(jìn)一步分析,截圖如下:
朋友這段話已經(jīng)說的非常言簡意賅了,那就上 windbg 說話吧。
二、Windbg 分析
1. 到底是哪一方面的泄漏
根據(jù)朋友描述,程序運(yùn)行一段時(shí)間后,內(nèi)存就炸了,應(yīng)該沒造成人員傷亡,不然也不會跟我wx聊天了,這里可以用 .time 看看當(dāng)前的 process 跑了多久。
- 0:000> .time
 - Debug session time: Thu Oct 21 14:54:39.000 2021 (UTC + 8:00)
 - System Uptime: 6 days 4:37:27.851
 - Process Uptime: 0 days 0:40:14.000
 - Kernel time: 0 days 0:01:55.000
 - User time: 0 days 0:07:33.000
 
看的出來,這個(gè) dump 是在程序跑了 40min 之后抓的,接下來我們比較一下 process 的內(nèi)存和 gc堆 占比, 看看到底是哪一塊的泄漏。
- 0:000> !address -summary
 - --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
 - MEM_FREE 327 7dfc`c665a000 ( 125.987 TB) 98.43%
 - MEM_RESERVE 481 201`e91a2000 ( 2.007 TB) 99.74% 1.57%
 - MEM_COMMIT 2307 1`507f4000 ( 5.258 GB) 0.26% 0.00%
 - 0:000> !eeheap -gc
 - Number of GC Heaps: 2
 - ------------------------------
 - GC Allocated Heap Size: Size: 0x139923528 (5260850472) bytes.
 - GC Committed Heap Size: Size: 0x13bf23000 (5300695040) bytes.
 
2. 到底是什么占用了如此大的內(nèi)存
知道是 托管層 的泄漏,感覺一下子就幸福起來了,接下來用 !dumpheap -stat 看看有沒有什么大對象可挖。
- 0:000> !dumpheap -stat
 - Statistics:
 - MT Count TotalSize Class Name
 - 00007ffdeb1fc400 5362921 128710104 xxxBLLs.xxx.BundleBiz+<>c__DisplayClass20_0
 - 00007ffdeaeff140 5362929 171613728 System.Collections.Generic.List`1[[xxx.xxx, xxx]]
 - 00007ffdeaeff640 5362957 171615272 xxx.BLLs.Plan.Dto.xxx[]
 - 00007ffde8171e18 16146362 841456072 System.String
 - 00007ffdeb210098 5362921 1415811144 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - 00007ffdea9ca260 5362921 2359685240 xxx.Bundle
 
從輸出看,內(nèi)存主要被 xxx.Bundle 和 AsyncTaskMethodBuilder 兩大類對象給吃掉了,數(shù)量都高達(dá) 536w,這里有一個(gè)非常有意思的地方,如果你了解異步,我相信你一看就能看出 AsyncTaskMethodBuilder + VoidTaskResult 是干嘛的,按照經(jīng)驗(yàn),這位朋友應(yīng)該誤入了 異步無限遞歸 ,那怎么去挖呢?接著往下看。
3. 尋找問題代碼
看到上面的 xxx.BundleBiz+
雖然找到了源碼,但代碼是 ILSpy 反編譯出來的異步狀態(tài)機(jī),接下來的一個(gè)問題是,如何根據(jù)狀態(tài)機(jī)代碼反向?qū)ふ业?await ,async 代碼呢?在 ILSpy 中有一個(gè) used by 功能,在這里可以用起來了。
雙擊 used by 就能看到真正的調(diào)用代碼,簡化后如下:
- public async Task DistributionBundle(List<Bundle> list, List<xxx> bwdList, xxx item, List<xxx> sumDetails, List<xxx> details, BundleParameter bundleParameter, IEnumerable<dynamic> labels)
 - {
 - int num = 0;
 - foreach (xxx detail in sumDetails)
 - {
 - IEnumerable<xxx> woDetails = details.Where((xxx w) => w.Size == detail.Size && w.Color == detail.Color);
 - foreach (xxx item2 in woDetails)
 - {
 - xxx
 - }
 - woDetails = woDetails.OrderBy((xxx s) => s.Seq).ToList();
 - num++;
 - xxx
 - Bundle bundle = new Bundle();
 - Bundle bundle2 = bundle;
 - bundle2.BundleId = await _repo.CreateBundleId();
 - foreach (xxx item3 in woDetails)
 - {
 - item3.TaskQty = item3.WoQty + Math.Ceiling(item3.WoQty * item3.OverCutRate);
 - decimal value = default(decimal);
 - }
 - await DistributionBundle(list, bwdList, item, sumDetails, details, bundleParameter, labels);
 - }
 - }
 
仔細(xì)看上面這段代碼, 我去, await DistributionBundle(list, bwdList, item, sumDetails, details, bundleParameter, labels); 又調(diào)用了自身,看樣子是某種條件下陷入了一個(gè)死遞歸。
有些朋友可能要問,除了經(jīng)驗(yàn)之外,能從 dump 中分析出來嗎?當(dāng)然可以,從 500w+ 中抽一個(gè)看看它的 !gcroot 即可。
- 0:000> !DumpHeap /d -mt 00007ffdeb210098
 - Address MT Size
 - 000001a297913a68 00007ffdeb210098 264
 - 000001a297913b70 00007ffdeb210098 264
 - 0:000> !gcroot 000001a297913a68
 - Thread 5ac:
 - 000000470B1EE4E0 00007FFE45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
 - rbp+10: 000000470b1ee550
 - -> 000001A297A25D88 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
 - -> 000001A29796D8C0 Microsoft.Extensions.Hosting.Internal.Host
 - ...
 - -> 000001A298213248 System.Data.SqlClient.TdsParserStateObjectNative
 - -> 000001A32E6AB700 System.Threading.Tasks.TaskFactory`1+<>c__DisplayClass38_0`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient],[System.Data.CommandBehavior, System.Data.Common]]
 - -> 000001A32E6AB728 System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
 - -> 000001A32E6ABB18 System.Threading.Tasks.StandardTaskContinuation
 - -> 000001A32E6ABA80 System.Threading.Tasks.ContinuationTaskFromResultTask`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
 - -> 000001A32E6AB6C0 System.Action`1[[System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]], System.Private.CoreLib]]
 - -> 000001A32E6AB428 System.Data.SqlClient.SqlCommand+<>c__DisplayClass130_0
 - ...
 - -> 000001A32E6ABC08 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[Dapper.SqlMapper+<QueryRowAsync>d__34`1[[System.String, System.Private.CoreLib]], Dapper]]
 - -> 000001A32E6ABD20 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[xxx.DALs.xxx.BundleRepo+<CreateBundleId>d__12, xxx]]
 - -> 000001A32E6ABD98 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A32E6A6BD8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A433250520 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A32E69E0F8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A433247D28 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A433246330 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A32E69A568 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - -> 000001A433245408 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
 - ...
 
從調(diào)用棧來看,代碼貌似是從數(shù)據(jù)庫讀取記錄的過程中陷入死循環(huán)的。
4. 為什么沒有出現(xiàn)棧溢出
一看到無限循環(huán),我相信很多朋友肯定要問,為啥沒出現(xiàn)堆棧溢出,畢竟默認(rèn)的線程??臻g僅僅 1M 而已,從 !gcroot 上看,這些引用都是掛在 5ac 線程上,也就是下面輸出的 主線程 ,而且主線程棧也非常干凈。
- 0:000> !t
 - ThreadCount: 30
 - UnstartedThread: 0
 - BackgroundThread: 24
 - PendingThread: 0
 - DeadThread: 5
 - Hosted Runtime: no
 - Lock
 - DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
 - 0 1 5ac 000001A29752CDF0 202a020 Preemptive 0000000000000000:0000000000000000 000001a29754c570 0 MTA
 - 4 2 1e64 000001A29752A490 2b220 Preemptive 0000000000000000:0000000000000000 000001a29754c570 0 MTA (Finalizer)
 - ...
 - 0:000> !clrstack
 - OS Thread Id: 0x5ac (0)
 - Child SP IP Call Site
 - 000000470B1EE1D0 00007ffe5eb30544 [GCFrame: 000000470b1ee1d0]
 - 000000470B1EE318 00007ffe5eb30544 [HelperMethodFrame_1OBJ: 000000470b1ee318] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
 - 000000470B1EE440 00007ffe45103c25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
 - 000000470B1EE4E0 00007ffe45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
 - 000000470B1EE550 00007ffe451032cf System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2861]
 - 000000470B1EE5D0 00007ffe45121b04 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 143]
 - 000000470B1EE600 00007ffe4510482d System.Runtime.CompilerServices.TaskAwaiter.GetResult() [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 106]
 - 000000470B1EE630 00007ffe4de36595 Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(Microsoft.Extensions.Hosting.IHost) [/_/src/Hosting/Abstractions/src/HostingAbstractionsHostExtensions.cs @ 49]
 - 000000470B1EE660 00007ffde80f3b4b xxx.Program.Main(System.String[])
 - 000000470B1EE8B8 00007ffe47c06c93 [GCFrame: 000000470b1ee8b8]
 - 000000470B1EEE50 00007ffe47c06c93 [GCFrame: 000000470b1eee50]
 
如果你稍微了解一點(diǎn)異步的玩法,你應(yīng)該知道這其中有一個(gè) IO完成端口 的概念,它可以實(shí)現(xiàn) 句柄 和 ThreadPool 的綁定,無限遞歸只不過是進(jìn)了 IO完成端口 的待回調(diào)隊(duì)列中而已,理論上和??臻g沒什么關(guān)系,也就不會出現(xiàn)棧溢出了。
三、總結(jié)
本次內(nèi)存泄漏的事故主要還是因?yàn)槌绦騿T的大意,也許是長期的 996 給弄恍惚了 ??????,有了這些信息,修正起來相信會非常簡單。
本文轉(zhuǎn)載自微信公眾號「一線碼農(nóng)聊技術(shù)」,可以通過以下二維碼關(guān)注。轉(zhuǎn)載本文請聯(lián)系一線碼農(nóng)聊技術(shù)公眾號。




















 
 
 





 
 
 
 