Watchdog機制源碼分析
前言
Linux引入Watchdog,在Linux內核下,當Watchdog啟動后,便設定了一個定時器,如果在超時時間內沒有對/dev/Watchdog進行寫操作,則會導致系統(tǒng)重啟。通過定時器實現(xiàn)的Watchdog屬于軟件層面;
Android設計了一個軟件層面Watchdog,用于保護一些重要的系統(tǒng)服務,當出現(xiàn)故障時,通常會讓Android系統(tǒng)重啟,由于這種機制的存在,就經常會出現(xiàn)一些system_server進程被Watchdog殺掉而發(fā)生手機重啟的問題;
今天我們就來分析下原理;
一、WatchDog啟動機制詳解
ANR機制是針對應用的,對于系統(tǒng)進程來說,如果長時間“無響應”,Android系統(tǒng)設計了WatchDog機制來管控。如果超過了“無響應”的延時,那么系統(tǒng)WatchDog會觸發(fā)自殺機制;
Watchdog是一個線程,繼承于Thread,在SystemServer.java里面通過getInstance獲取watchdog的對象;
1、在SystemServer.java中啟動
- private void startOtherServices() {
 - ······
 - traceBeginAndSlog("InitWatchdog");
 - final Watchdog watchdog = Watchdog.getInstance();
 - watchdog.init(context, mActivityManagerService);
 - traceEnd();
 - ······
 - traceBeginAndSlog("StartWatchdog");
 - Watchdog.getInstance().start();
 - traceEnd();
 - }
 
因為是線程,所以,只要start即可;
2、查看WatchDog的構造方法
- private Watchdog() {
 - super("watchdog");
 - // Initialize handler checkers for each common thread we want to check. Note
 - // that we are not currently checking the background thread, since it can
 - // potentially hold longer running operations with no guarantees about the timeliness
 - // of operations there.
 - // The shared foreground thread is the main checker. It is where we
 - // will also dispatch monitor checks and do other work.
 - mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
 - "foreground thread", DEFAULT_TIMEOUT);
 - mHandlerCheckers.add(mMonitorChecker);
 - // Add checker for main thread. We only do a quick check since there
 - // can be UI running on the thread.
 - mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
 - "main thread", DEFAULT_TIMEOUT));
 - // Add checker for shared UI thread.
 - mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
 - "ui thread", DEFAULT_TIMEOUT));
 - // And also check IO thread.
 - mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
 - "i/o thread", DEFAULT_TIMEOUT));
 - // And the display thread.
 - mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
 - "display thread", DEFAULT_TIMEOUT));
 - // Initialize monitor for Binder threads.
 - addMonitor(new BinderThreadMonitor());
 - mOpenFdMonitor = OpenFdMonitor.create();
 - // See the notes on DEFAULT_TIMEOUT.
 - assert DB ||
 - DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
 - // mtk enhance
 - exceptionHWT = new ExceptionLog();
 - }
 
重點關注兩個對象:mMonitorChecker和mHandlerCheckers
mHandlerCheckers列表元素的來源:
構造對象的導入:UiThread、IoThread、DisplatyThread、FgThread加入
外部導入:Watchdog.getInstance().addThread(handler);
mMonitorChecker列表元素的來源:
外部導入:Watchdog.getInstance().addMonitor(monitor);
特別說明:addMonitor(new BinderThreadMonitor());
3、查看WatchDog的run方法
- public void run() {
 - boolean waitedHalf = false;
 - boolean mSFHang = false;
 - while (true) {
 - ······
 - synchronized (this) {
 - ······
 - for (int i=0; i<mHandlerCheckers.size(); i++) {
 - HandlerChecker hc = mHandlerCheckers.get(i);
 - hc.scheduleCheckLocked();
 - }
 - ······
 - }
 - ······
 - }
 
對mHandlerCheckers列表元素進行檢測;
4、查看HandlerChecker的scheduleCheckLocked
- public void scheduleCheckLocked() {
 - if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
 - // If the target looper has recently been polling, then
 - // there is no reason to enqueue our checker on it since that
 - // is as good as it not being deadlocked. This avoid having
 - // to do a context switch to check the thread. Note that we
 - // only do this if mCheckReboot is false and we have no
 - // monitors, since those would need to be executed at this point.
 - mCompleted = true;
 - return;
 - }
 - if (!mCompleted) {
 - // we already have a check in flight, so no need
 - return;
 - }
 - mCompleted = false;
 - mCurrentMonitor = null;
 - mStartTime = SystemClock.uptimeMillis();
 - mHandler.postAtFrontOfQueue(this);
 - }
 
mMonitors.size() == 0的情況:主要為了檢查mHandlerCheckers中的元素是否超時,運用的手段:mHandler.getLooper().getQueue().isPolling();
mMonitorChecker對象的列表元素一定是大于0,此時,關注點在mHandler.postAtFrontOfQueue(this);
- public void run() {
 - final int size = mMonitors.size();
 - for (int i = 0 ; i < size ; i++) {
 - synchronized (Watchdog.this) {
 - mCurrentMonitor = mMonitors.get(i);
 - }
 - mCurrentMonitor.monitor();
 - }
 - synchronized (Watchdog.this) {
 - mCompleted = true;
 - mCurrentMonitor = null;
 - }
 - }
 
監(jiān)聽monitor方法,這里是對mMonitors進行monitor,而能夠滿足條件的只有:mMonitorChecker,例如:各種服務通過addMonitor加入列表;
- ActivityManagerService.java
 - Watchdog.getInstance().addMonitor(this);
 - InputManagerService.java
 - Watchdog.getInstance().addMonitor(this);
 - PowerManagerService.java
 - Watchdog.getInstance().addMonitor(this);
 - ActivityManagerService.java
 - Watchdog.getInstance().addMonitor(this);
 - WindowManagerService.java
 - Watchdog.getInstance().addMonitor(this);
 
而被執(zhí)行的monitor方法很簡單,例如ActivityManagerService:
- public void monitor() {
 - synchronized (this) { }
 - }
 
這里僅僅是檢查系統(tǒng)服務是否被鎖住;
Watchdog的內部類;
- private static final class BinderThreadMonitor implements Watchdog.Monitor {
 - @Override
 - public void monitor() {
 - Binder.blockUntilThreadAvailable();
 - }
 - }
 - android.os.Binder.java
 - public static final native void blockUntilThreadAvailable();
 - android_util_Binder.cpp
 - static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
 - {
 - return IPCThreadState::self()->blockUntilThreadAvailable();
 - }
 - IPCThreadState.cpp
 - void IPCThreadState::blockUntilThreadAvailable()
 - {
 - pthread_mutex_lock(&mProcess->mThreadCountLock);
 - while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
 - ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
 - static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
 - static_cast<unsigned long>(mProcess->mMaxThreads));
 - pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
 - }
 - pthread_mutex_unlock(&mProcess->mThreadCountLock);
 - }
 
這里僅僅是檢查進程中包含的可執(zhí)行線程的數量不能超過mMaxThreads,如果超過了最大值(31個),就需要等待;
- ProcessState.cpp
 - #define DEFAULT_MAX_BINDER_THREADS 15
 - 但是systemserver.java進行了設置
 - // maximum number of binder threads used for system_server
 - // will be higher than the system default
 - private static final int sMaxBinderThreads = 31;
 - private void run() {
 - ······
 - BinderInternal.setMaxThreads(sMaxBinderThreads);
 - ······
 - }
 
5、發(fā)生超時后退出
- public void run() {
 - ······
 - Process.killProcess(Process.myPid());
 - System.exit(10);
 - ······
 - }
 
kill自己所在進程(system_server),并退出;
二、原理解釋
1、系統(tǒng)中所有需要監(jiān)控的服務都調用Watchdog的addMonitor添加Monitor Checker到mMonitors這個List中或者addThread方法添加Looper Checker到mHandlerCheckers這個List中;
2、當Watchdog線程啟動后,便開始無限循環(huán),它的run方法就開始執(zhí)行;
- 第一步調用HandlerChecker#scheduleCheckLocked處理所有的mHandlerCheckers
 - 第二步定期檢查是否超時,每一次檢查的間隔時間由CHECK_INTERVAL常量設定,為30秒,每一次檢查都會調用evaluateCheckerCompletionLocked()方法來評估一下HandlerChecker的完成狀態(tài):
 - COMPLETED表示已經完成;
 - WAITING和WAITED_HALF表示還在等待,但未超時,WAITED_HALF時候會dump一次trace.
 - OVERDUE表示已經超時。默認情況下,timeout是1分鐘;
 
3、如果超時時間到了,還有HandlerChecker處于未完成的狀態(tài)(OVERDUE),則通過getBlockedCheckersLocked()方法,獲取阻塞的HandlerChecker,生成一些描述信息,保存日志,包括一些運行時的堆棧信息。
4、最后殺死SystemServer進程;
總結
Watchdog是一個線程,用來監(jiān)聽系統(tǒng)各項服務是否正常運行,沒有發(fā)生死鎖;
HandlerChecker用來檢查Handler以及monitor;
monitor通過鎖來判斷是否死鎖;
超時30秒會輸出log,超時60秒會重啟;
Watchdog會殺掉自己的進程,也就是此時system_server進程id會變化;
本文轉載自微信公眾號「Android開發(fā)編程」


















 
 
 








 
 
 
 