記一次 .NET 某光放測試系統(tǒng)崩潰分析
一、背景
1. 講故事
微信好友里有位朋友找到我,說他部署在windows上的程序,用debug模式正常,但用 release 模式跑程序就崩潰,如果把程序切到 .NET6 的話又都正常,所以很迷茫,讓我看看怎么回事,哈哈,這種問題直接抓dump分析就好了。
二、崩潰分析
1. 為什么會(huì)崩潰
分析過崩潰程序的朋友應(yīng)該知道,不管是托管還是非托管崩潰,先用 !analyze -v 命令開路,簡化輸出如下:
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
rax=0000000000000004 rbx=000001e34b283ec0 rcx=0000000000000228
rdx=0000000000000000 rsi=000001e34ac2f4e0 rdi=000001e34ab58e70
rip=00007ff95ac53659 rsp=0000007735d7e1c0 rbp=0000007735d7e1e0
r8=0000000000000000 r9=000001e3464ba1c0 r10=0000000000000228
r11=0000000000000228 r12=0000000000000000 r13=000001e34880eae8
r14=000001e34ab58e70 r15=0000000000000008
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
System_Private_CoreLib!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw+0x39:
00007ff9`5ac53659 cc int 3
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff95ac53659 (System_Private_CoreLib!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw+0x0000000000000039)
ExceptionCode: e0434f4d (CLR exception)
ExceptionFlags: 00000000
NumberParameters: 0
...從卦中的 ExceptionCode: e0434f4d (CLR exception) 來看,這是一個(gè)經(jīng)典的托管異常,既然是托管異常,這個(gè)問題就比較簡單了,使用 !t 找下到底是哪一個(gè)托管線程拋的,輸出如下:
0:000> !t
ThreadCount: 15
UnstartedThread: 0
BackgroundThread: 11
PendingThread: 0
DeadThread: 3
Hosted Runtime: no
Lock
DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 81d8 000001E3464BA1C0 a6028 Preemptive 000001E34ABA2340:000001E34ABA3D30 000001e347fc40b0 -00001 STA Prism.Ioc.ContainerResolutionException 000001e34a986608
6 2 448c 000001E34803A440 2b228 Preemptive 000001E34A876980:000001E34A8784B0 000001e347fc40b0 -00001 MTA (Finalizer)
...從卦中的 Prism.Ioc.ContainerResolutionException 來看,貌似是和 Prism 有關(guān),接下來可以用 !pe 命令觀察調(diào)用棧詳情。
0:000> !pe
Exception object: 000001e34a986608
Exception type: Prism.Ioc.ContainerResolutionException
Message: An unexpected error occurred while resolving 'xxx.Views.LoginWindow'
InnerException: Unity.ResolutionFailedException, Use !PrintException 000001E34A986228 to see more.
StackTrace (generated):
SP IP Function
0000007735D668E0 00007FF95A64DEC8 Prism_Unity_Wpf!Prism.Unity.UnityContainerExtension.Resolve(System.Type, System.ValueTuple`2<System.Type,System.Object>[])+0x2a8
0000007735D7DC60 00007FF95A64DBFD Prism_Unity_Wpf!Prism.Unity.UnityContainerExtension.Resolve(System.Type)+0x3d
0000007735D7DCA0 00007FF95A64DB88 Prism!Prism.Ioc.IContainerProviderExtensions.Resolve[[System.__Canon, System.Private.CoreLib]](Prism.Ioc.IContainerProvider)+0x48
0000007735D7DCF0 00007FF95A956742 xxx!xxx.App.InitializeShell(System.Windows.Window)+0x42
0000007735D7DD40 00007FF959B21148 Prism_Wpf!Prism.PrismApplicationBase.Initialize()+0x208
0000007735D7DDA0 00007FF959B20F17 xxx!xxx.App.<>n__0()+0x17
....從卦象來看,這不是最原始的異常,言外之意就是下面還有子異常,也只有找到最里層的異常才能發(fā)現(xiàn)災(zāi)難的禍根,經(jīng)過一層層的下鉆,最后找到了最原始的異常,參考如下:
0:000> !PrintException /d 000001E34A97E940
Exception object: 000001e34a97e940
Exception type: System.PlatformNotSupportedException
Message: System.IO.Ports is currently only supported on Windows.
InnerException: <none>
StackTrace (generated):
SP IP Function
0000007735D7B580 00007FF95A9588E7 System_IO_Ports!System.IO.Ports.SerialPort.GetPortNames()+0x47
0000007735D7B5C0 00007FF95A958859 xxx!xxx.ViewModels.LoginWindowViewModel.RefreshComs()+0x19
0000007735D7B600 00007FF95A957FBC xxx!xxx.ViewModels.LoginWindowViewModel..ctor()+0x14c
0000007735D7B9D0 0000000000000000 System_Private_CoreLib!System.RuntimeMethodHandle.InvokeMethod(System.Object, Void**, System.Signature, Boolean)+0x46a770b0
0000007735D7B9D0 00007FF9B8C03106 System_Private_CoreLib!System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(System.Object, System.Reflection.BindingFlags)+0x36
StackTraceString: <none>
HResult: 80131539從卦中來看是 GetPortNames() 方法拋出來的平臺不支持異常,這就很迷惑了。
2. 為什么會(huì)平臺不支持
了解 PlatformNotSupportedException 異常,只能尋找相關(guān)的源代碼了,通過dnspy截圖如下:
圖片
從卦中來看這是一個(gè)空方法,接下來拿這個(gè)異常在網(wǎng)上找下資料,看樣子是這位朋友需要升級或者降級 system.io.ports 的版本,截圖如下:
圖片
完整鏈接:https://learn.microsoft.com/en-us/answers/questions/1621393/system-io-ports-only-availble-on-windows-but-im-us
本來是很興奮的,以為是類似多線程操控非 volatile 變量導(dǎo)致的debug和release行為不一致呢,結(jié)果是這玩意,害!
三、總結(jié)
本次故障相對比較簡單,對我們這些老手來說簡直是 1+1,但我們何嘗不是從新手練過來的,所以本篇是初學(xué)者很好的一個(gè)練手素材。





































