偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<strike id="3kmwm"><span id="3kmwm"></span></strike>

<u id="3kmwm"><menu id="3kmwm"></menu></u><rt id="3kmwm"><menu id="3kmwm"></menu></rt>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

聊聊Docker EOF問題排查

作者：華仔 2021-06-01 07:55:42

安全應(yīng)用安全

問題排查過程，源碼部分均由我的開發(fā)同事排查和記錄;在征得其同意后，由我發(fā)表在此。

[[402863]]

本文轉(zhuǎn)載自微信公眾號「運維開發(fā)故事」，作者運維開發(fā)故事。轉(zhuǎn)載本文請聯(lián)系運維開發(fā)故事公眾號。

一、前言

問題排查過程，源碼部分均由我的開發(fā)同事排查和記錄;在征得其同意后，由我發(fā)表在此。

二、問題

某天接到客戶反饋，pod的事件中出現(xiàn)大量的 warning event: Readiness probe failed: OCI runtime exec failed: exec failed: EOF: unknown。但不影響客戶訪問該服務(wù)。

三、環(huán)境

特別說明：客戶在負(fù)責(zé)運行業(yè)務(wù)的k8s節(jié)點上堅持開啟了cpu-manager

組件	版本
k8s	1.14.x

四、排查

1、接到客戶反饋后，檢查該pod所在節(jié)點的kubelet日志，如下：

I0507 03:43:28.310630  57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0- 
ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown 
I0507 07:08:49.834093  57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(a89a158e- 
ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: unexpected EOF: unknown 
I0507 10:06:58.307881  57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0- 
ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown

probe的錯誤類型為failure，對應(yīng)代碼如下：圖片2、再查看docker日志，如下：

time="2021-05-06T16:51:40.009989451+08:00" level=error msg="stream copy error: reading from a closed fifo" 
time="2021-05-06T16:51:40.010054596+08:00" level=error msg="stream copy error: reading from a closed fifo" 
time="2021-05-06T16:51:40.170676532+08:00" level=error msg="Error running exec 8e34e8b910694abe95a467b2936b37635fdabd2f7b7c464d 
fef952fa5732aa4e in container: OCI runtime exec failed: exec failed: EOF: unknown"

雖然從Docker日志中顯示是 stream copy error，但實際上是底層的 runc 返回了 EOF，導(dǎo)致返回了 error。3、因為日志中顯示 probe 類型為 Failure，因此 e.CombinedOutPut() 的 err != nil，并且 ExitStatus 不為 0，data 的值為 OCI runtime exec failed: exec failed: unexpected EOF: unknown，最終會調(diào)用到 RunInContainer 方法

ExecSync 是通過 GRPC 調(diào)用了 dockershim 的 ExecSync

dockershim 最終調(diào)用到 ExecInContainer 方法，并且該方法的返回了 exitcode 不為 0 的 error。

func (*NativeExecHandler) ExecInContainer(client libdocker.Interface, container *dockertypes.ContainerJSON, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize, timeout time.Duration) error { 
   execObj, err := client.CreateExec(container.ID, createOpts) 
    
   startOpts := dockertypes.ExecStartCheck{Detach: false, Tty: tty} 
   streamOpts := libdocker.StreamOptions{ 
      InputStream:  stdin, 
      OutputStream: stdout, 
      ErrorStream:  stderr, 
      RawTerminal:  tty, 
      ExecStarted:  execStarted, 
   } 
   err = client.StartExec(execObj.ID, startOpts, streamOpts) 
   if err != nil { 
      return err 
   } 
 
    ticker := time.NewTicker(2 * time.Second) 
    defer ticker.Stop() 
    count := 0 
    for { 
       inspect, err2 := client.InspectExec(execObj.ID) 
       if err2 != nil { 
          return err2 
       } 
       if !inspect.Running { 
          if inspect.ExitCode != 0 { 
             err = &dockerExitError{inspect} 
          } 
          break 
       } 
     
       count++ 
       if count == 5 { 
          klog.Errorf("Exec session %s in container %s terminated but process still running!", execObj.ID, container.ID) 
          break 
       } 
     
       <-ticker.C 
    } 
 
   return err 
}

ExecInContainer 做了以下幾件事：

調(diào)用 CreateExec 創(chuàng)建 ExecID
調(diào)用 StartExec 執(zhí)行 exec，并通過 holdHijackedConnection 來重定向輸入輸出。將 inputStream 寫入到 connection，并將 response stream 重定向到 stdout，stderr。
調(diào)用 InspectExec 獲取本次 exec 的運行狀態(tài)和 exitcode

那么日志中打印的報錯就是 response stream 傳遞過來的字符流。也就是說，dockerd 的 response 中包含了錯誤值。

此時去 docker 代碼中查找原因，ExecStart 會調(diào)用到 dockerd 的以下代碼：

根據(jù)上面 docker 的日志，err 的錯誤信息為：OCI runtime exec failed: exec failed: EOF: unknown。也就是說 ContainerExecStart 返回了錯誤。ContainerExecStart 會調(diào)用到 containerd.Exec，也就是 dockerd 和 containerd 之間進(jìn)行通信

// docker/libcontainerd/client_daemon.go 
// Exec creates exec process. 
// 
// The containerd client calls Exec to register the exec config in the shim side. 
// When the client calls Start, the shim will create stdin fifo if needs. But 
// for the container main process, the stdin fifo will be created in Create not 
// the Start call. stdinCloseSync channel should be closed after Start exec 
// process. 
func (c *client) Exec(ctx context.Context, containerID, processID string, spec *specs.Process, withStdin bool, attachStdio StdioCallback) (int, error) { 
   ctr := c.getContainer(containerID) 
   if ctr == nil { 
      return -1, errors.WithStack(newNotFoundError("no such container")) 
   } 
   t := ctr.getTask() 
   if t == nil { 
      return -1, errors.WithStack(newInvalidParameterError("container is not running")) 
   } 
 
   if p := ctr.getProcess(processID); p != nil { 
      return -1, errors.WithStack(newConflictError("id already in use")) 
   } 
 
   var ( 
      p              containerd.Process 
      rio            cio.IO 
      err            error 
      stdinCloseSync = make(chan struct{}) 
   ) 
 
   fifos := newFIFOSet(ctr.bundleDir, processID, withStdin, spec.Terminal) 
 
   defer func() { 
      if err != nil { 
         if rio != nil { 
            rio.Cancel() 
            rio.Close() 
         } 
      } 
   }() 
 
   p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) { 
      rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio) 
      return rio, err 
   }) 
   if err != nil { 
      close(stdinCloseSync) 
      return -1, wrapError(err) 
   } 
 
   ctr.addProcess(processID, p) 
 
   // Signal c.createIO that it can call CloseIO 
   // 
   // the stdin of exec process will be created after p.Start in containerd 
   defer close(stdinCloseSync) 
 
   if err = p.Start(ctx); err != nil { 
      // use new context for cleanup because old one may be cancelled by user, but leave a timeout to make sure 
      // we are not waiting forever if containerd is unresponsive or to work around fifo cancelling issues in 
      // older containerd-shim 
      ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second) 
      defer cancel() 
      p.Delete(ctx) 
      ctr.deleteProcess(processID) 
      return -1, wrapError(err) 
   } 
 
   return int(p.Pid()), nil 
}

這里 new 了一個 FIFOSet，而 reading from a closed fifo 僅出現(xiàn)在 fifo 被 close 掉時，仍然在讀取的情況。即 f.Close() 發(fā)生在 f.Read() 前面。在外層可以看到

defer func() { 
   if err != nil { 
      if rio != nil { 
         rio.Cancel() 
         rio.Close() // 這里 Close 會導(dǎo)致 fifo close 
      } 
   } 
}() 
 
p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) { 
   rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio) 
   return rio, err 
}) 
if err != nil { 
   close(stdinCloseSync) 
   return -1, wrapError(err) 
} 
 
ctr.addProcess(processID, p) 
 
// Signal c.createIO that it can call CloseIO 
// 
// the stdin of exec process will be created after p.Start in containerd 
defer close(stdinCloseSync) 
 
// p.Start 出錯，會導(dǎo)致內(nèi)部的 fifo 關(guān)閉，從而導(dǎo)致 reading from a closed fifo 的問題 
if err = p.Start(ctx); err != nil { 
   // use new context for cleanup because old one may be cancelled by user, but leave a timeout to make sure 
   // we are not waiting forever if containerd is unresponsive or to work around fifo cancelling issues in 
   // older containerd-shim 
   ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second) 
   defer cancel() 
   p.Delete(ctx) 
   ctr.deleteProcess(processID) 
   return -1, wrapError(err) 
}

p.Start 調(diào)用到下面的代碼，通過 GRPC 和 containerd 通信。

// github.com/containerd/containerd/task.go 
func (t *task) Start(ctx context.Context) error { 
   r, err := t.client.TaskService().Start(ctx, &tasks.StartRequest{ 
      ContainerID: t.id, 
   }) 
   if err != nil { 
      t.io.Cancel() 
      t.io.Close() 
      return errdefs.FromGRPC(err) 
   } 
   t.pid = r.Pid 
   return nil 
}

這個 GRPC 調(diào)用會到達(dá) containerd 以下的代碼：

func (e *execProcess) start(ctx context.Context) (err error) { 
   var ( 
      socket  *runc.Socket 
      pidfile = filepath.Join(e.path, fmt.Sprintf("%s.pid", e.id)) 
   ) 
   if e.stdio.Terminal { 
      if socket, err = runc.NewTempConsoleSocket(); err != nil { 
         return errors.Wrap(err, "failed to create runc console socket") 
      } 
      defer socket.Close() 
   } else if e.stdio.IsNull() { 
      if e.io, err = runc.NewNullIO(); err != nil { 
         return errors.Wrap(err, "creating new NULL IO") 
      } 
   } else { 
      if e.io, err = runc.NewPipeIO(e.parent.IoUID, e.parent.IoGID, withConditionalIO(e.stdio)); err != nil { 
         return errors.Wrap(err, "failed to create runc io pipes") 
      } 
   } 
   opts := &runc.ExecOpts{ 
      PidFile: pidfile, 
      IO:      e.io, 
      Detach:  true, 
   } 
   if socket != nil { 
      opts.ConsoleSocket = socket 
   } 
   // err 返回了 exec failed: EOF: unknown 
   // 這里的 runtime 就是 runc 的二進(jìn)制文件執(zhí)行命令 
   if err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != nil { 
      close(e.waitBlock) 
      return e.parent.runtimeError(err, "OCI runtime exec failed") 
   }

Exec 的代碼如下：

// Exec executres and additional process inside the container based on a full 
// OCI Process specification 
func (r *Runc) Exec(context context.Context, id string, spec specs.Process, opts *ExecOpts) error { 
   f, err := ioutil.TempFile(os.Getenv("XDG_RUNTIME_DIR"), "runc-process") 
   if err != nil { 
      return err 
   } 
   defer os.Remove(f.Name()) 
   err = json.NewEncoder(f).Encode(spec) 
   f.Close() 
   if err != nil { 
      return err 
   } 
   args := []string{"exec", "--process", f.Name()} 
   if opts != nil { 
      oargs, err := opts.args() 
      if err != nil { 
         return err 
      } 
      args = append(args, oargs...) 
   } 
   cmd := r.command(context, append(args, id)...) 
   if opts != nil && opts.IO != nil { 
      opts.Set(cmd) 
   } 
   if cmd.Stdout == nil && cmd.Stderr == nil { 
      data, err := cmdOutput(cmd, true) 
      if err != nil { 
         return fmt.Errorf("%s: %s", err, data) 
      } 
      return nil 
   } 
   ec, err := Monitor.Start(cmd) 
   if err != nil { 
      return err 
   } 
   if opts != nil && opts.IO != nil { 
      if c, ok := opts.IO.(StartCloser); ok { 
         if err := c.CloseAfterStart(); err != nil { 
            return err 
         } 
      } 
   } 
   status, err := Monitor.Wait(cmd, ec) 
   if err == nil && status != 0 { 
      err = fmt.Errorf("%s did not terminate sucessfully", cmd.Args[0]) 
   } 
   return err 
}

因此是 runc 在運行后輸出了 exec failed: EOF: unknown 這個錯誤。

將 runc 指令循環(huán)執(zhí)行，可少量復(fù)現(xiàn)。經(jīng)過排查，發(fā)現(xiàn) runc exec 在運行期間會讀取 container 的 state.json，并使用 json decode 時出現(xiàn)異常。

此時聯(lián)想到開啟 kubelet cpu-manager 后，會 update container，也就是更新這個 state.json 文件。導(dǎo)致 runc 讀到了部分 cpu-manager 更新的內(nèi)容。從而導(dǎo)致 json decode 失敗。此時排查 runc EOF 和 kubelet cpu-manager update container(默認(rèn)每 10s 更新一次) 的時間，發(fā)現(xiàn)時間點剛好吻合，驗證猜想。

查看 runc 是否有修復(fù)，發(fā)現(xiàn)了這個 pr: https://github.com/opencontainers/runc/pull/2467。修復(fù)思路是將 saveState 變成原子操作，這樣就不會出現(xiàn)讀取 state.json 時，讀到部分寫入的內(nèi)容，導(dǎo)致 unexpected EOF (或 EOF)的問題

// 原來的 
func (c *linuxContainer) saveState(s *State) error { 
   f, err := os.Create(filepath.Join(c.root, stateFilename)) 
   if err != nil { 
      return err 
   } 
   defer f.Close() 
   return utils.WriteJSON(f, s) 
} 
// 修復(fù)后的 
func (c *linuxContainer) saveState(s *State) (retErr error) { 
        tmpFile, err := ioutil.TempFile(c.root, "state-") 
        if err != nil { 
                return err 
        } 
 
        defer func() { 
                if retErr != nil { 
                        tmpFile.Close() 
                        os.Remove(tmpFile.Name()) 
                } 
        }() 
 
        err = utils.WriteJSON(tmpFile, s) 
        if err != nil { 
                return err 
        } 
        err = tmpFile.Close() 
        if err != nil { 
                return err 
        } 
 
        stateFilePath := filepath.Join(c.root, stateFilename) 
        return os.Rename(tmpFile.Name(), stateFilePath) 
}

五、解決

關(guān)閉cpu-manager

升級runc

責(zé)任編輯：武曉燕來源：運維開發(fā)故事

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<sup id="th3mz"></sup>

<s id="th3mz"><table id="th3mz"><option id="th3mz"></option></table></s>