聊聊Docker EOF問題排查
本文轉(zhuǎn)載自微信公眾號「運維開發(fā)故事」,作者運維開發(fā)故事。轉(zhuǎn)載本文請聯(lián)系運維開發(fā)故事公眾號。
一、前言
問題排查過程,源碼部分均由我的開發(fā)同事排查和記錄;在征得其同意后,由我發(fā)表在此。
二、問題
某天接到客戶反饋,pod的事件中出現(xiàn)大量的 warning event: Readiness probe failed: OCI runtime exec failed: exec failed: EOF: unknown。但不影響客戶訪問該服務(wù)。
三、環(huán)境
特別說明:客戶在負(fù)責(zé)運行業(yè)務(wù)的k8s節(jié)點上堅持開啟了cpu-manager
| 組件 | 版本 | |
|---|---|---|
| k8s | 1.14.x | 
四、排查
1、接到客戶反饋后,檢查該pod所在節(jié)點的kubelet日志,如下:
- I0507 03:43:28.310630 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0-
 - ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown
 - I0507 07:08:49.834093 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(a89a158e-
 - ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: unexpected EOF: unknown
 - I0507 10:06:58.307881 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0-
 - ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown
 
probe的錯誤類型為failure,對應(yīng)代碼如下:圖片2、再查看docker日志,如下:
- time="2021-05-06T16:51:40.009989451+08:00" level=error msg="stream copy error: reading from a closed fifo"
 - time="2021-05-06T16:51:40.010054596+08:00" level=error msg="stream copy error: reading from a closed fifo"
 - time="2021-05-06T16:51:40.170676532+08:00" level=error msg="Error running exec 8e34e8b910694abe95a467b2936b37635fdabd2f7b7c464d
 - fef952fa5732aa4e in container: OCI runtime exec failed: exec failed: EOF: unknown"
 
雖然從Docker日志中顯示是 stream copy error,但實際上是底層的 runc 返回了 EOF,導(dǎo)致返回了 error。3、因為日志中顯示 probe 類型為 Failure,因此 e.CombinedOutPut() 的 err != nil,并且 ExitStatus 不為 0,data 的值為 OCI runtime exec failed: exec failed: unexpected EOF: unknown,最終會調(diào)用到 RunInContainer 方法
ExecSync 是通過 GRPC 調(diào)用了 dockershim 的 ExecSync
dockershim 最終調(diào)用到 ExecInContainer 方法,并且該方法的返回了 exitcode 不為 0 的 error。
- func (*NativeExecHandler) ExecInContainer(client libdocker.Interface, container *dockertypes.ContainerJSON, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize, timeout time.Duration) error {
 - execObj, err := client.CreateExec(container.ID, createOpts)
 - startOpts := dockertypes.ExecStartCheck{Detach: false, Tty: tty}
 - streamOpts := libdocker.StreamOptions{
 - InputStream: stdin,
 - OutputStream: stdout,
 - ErrorStream: stderr,
 - RawTerminal: tty,
 - ExecStarted: execStarted,
 - }
 - err = client.StartExec(execObj.ID, startOpts, streamOpts)
 - if err != nil {
 - return err
 - }
 - ticker := time.NewTicker(2 * time.Second)
 - defer ticker.Stop()
 - count := 0
 - for {
 - inspect, err2 := client.InspectExec(execObj.ID)
 - if err2 != nil {
 - return err2
 - }
 - if !inspect.Running {
 - if inspect.ExitCode != 0 {
 - err = &dockerExitError{inspect}
 - }
 - break
 - }
 - count++
 - if count == 5 {
 - klog.Errorf("Exec session %s in container %s terminated but process still running!", execObj.ID, container.ID)
 - break
 - }
 - <-ticker.C
 - }
 - return err
 - }
 
ExecInContainer 做了以下幾件事:
- 調(diào)用 CreateExec 創(chuàng)建 ExecID
 - 調(diào)用 StartExec 執(zhí)行 exec,并通過 holdHijackedConnection 來重定向輸入輸出。將 inputStream 寫入到 connection,并將 response stream 重定向到 stdout,stderr。
 - 調(diào)用 InspectExec 獲取本次 exec 的運行狀態(tài)和 exitcode
 
那么日志中打印的報錯就是 response stream 傳遞過來的字符流。也就是說,dockerd 的 response 中包含了錯誤值。
此時去 docker 代碼中查找原因,ExecStart 會調(diào)用到 dockerd 的以下代碼:
根據(jù)上面 docker 的日志,err 的錯誤信息為:OCI runtime exec failed: exec failed: EOF: unknown。也就是說 ContainerExecStart 返回了錯誤。ContainerExecStart 會調(diào)用到 containerd.Exec,也就是 dockerd 和 containerd 之間進(jìn)行通信
- // docker/libcontainerd/client_daemon.go
 - // Exec creates exec process.
 - //
 - // The containerd client calls Exec to register the exec config in the shim side.
 - // When the client calls Start, the shim will create stdin fifo if needs. But
 - // for the container main process, the stdin fifo will be created in Create not
 - // the Start call. stdinCloseSync channel should be closed after Start exec
 - // process.
 - func (c *client) Exec(ctx context.Context, containerID, processID string, spec *specs.Process, withStdin bool, attachStdio StdioCallback) (int, error) {
 - ctr := c.getContainer(containerID)
 - if ctr == nil {
 - return -1, errors.WithStack(newNotFoundError("no such container"))
 - }
 - t := ctr.getTask()
 - if t == nil {
 - return -1, errors.WithStack(newInvalidParameterError("container is not running"))
 - }
 - if p := ctr.getProcess(processID); p != nil {
 - return -1, errors.WithStack(newConflictError("id already in use"))
 - }
 - var (
 - p containerd.Process
 - rio cio.IO
 - err error
 - stdinCloseSync = make(chan struct{})
 - )
 - fifos := newFIFOSet(ctr.bundleDir, processID, withStdin, spec.Terminal)
 - defer func() {
 - if err != nil {
 - if rio != nil {
 - rio.Cancel()
 - rio.Close()
 - }
 - }
 - }()
 - p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) {
 - rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio)
 - return rio, err
 - })
 - if err != nil {
 - close(stdinCloseSync)
 - return -1, wrapError(err)
 - }
 - ctr.addProcess(processID, p)
 - // Signal c.createIO that it can call CloseIO
 - //
 - // the stdin of exec process will be created after p.Start in containerd
 - defer close(stdinCloseSync)
 - if err = p.Start(ctx); err != nil {
 - // use new context for cleanup because old one may be cancelled by user, but leave a timeout to make sure
 - // we are not waiting forever if containerd is unresponsive or to work around fifo cancelling issues in
 - // older containerd-shim
 - ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
 - defer cancel()
 - p.Delete(ctx)
 - ctr.deleteProcess(processID)
 - return -1, wrapError(err)
 - }
 - return int(p.Pid()), nil
 - }
 
這里 new 了一個 FIFOSet,而 reading from a closed fifo 僅出現(xiàn)在 fifo 被 close 掉時,仍然在讀取的情況。即 f.Close() 發(fā)生在 f.Read() 前面。在外層可以看到
- defer func() {
 - if err != nil {
 - if rio != nil {
 - rio.Cancel()
 - rio.Close() // 這里 Close 會導(dǎo)致 fifo close
 - }
 - }
 - }()
 - p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) {
 - rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio)
 - return rio, err
 - })
 - if err != nil {
 - close(stdinCloseSync)
 - return -1, wrapError(err)
 - }
 - ctr.addProcess(processID, p)
 - // Signal c.createIO that it can call CloseIO
 - //
 - // the stdin of exec process will be created after p.Start in containerd
 - defer close(stdinCloseSync)
 - // p.Start 出錯,會導(dǎo)致內(nèi)部的 fifo 關(guān)閉,從而導(dǎo)致 reading from a closed fifo 的問題
 - if err = p.Start(ctx); err != nil {
 - // use new context for cleanup because old one may be cancelled by user, but leave a timeout to make sure
 - // we are not waiting forever if containerd is unresponsive or to work around fifo cancelling issues in
 - // older containerd-shim
 - ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
 - defer cancel()
 - p.Delete(ctx)
 - ctr.deleteProcess(processID)
 - return -1, wrapError(err)
 - }
 
p.Start 調(diào)用到下面的代碼,通過 GRPC 和 containerd 通信。
- // github.com/containerd/containerd/task.go
 - func (t *task) Start(ctx context.Context) error {
 - r, err := t.client.TaskService().Start(ctx, &tasks.StartRequest{
 - ContainerID: t.id,
 - })
 - if err != nil {
 - t.io.Cancel()
 - t.io.Close()
 - return errdefs.FromGRPC(err)
 - }
 - t.pid = r.Pid
 - return nil
 - }
 
這個 GRPC 調(diào)用會到達(dá) containerd 以下的代碼:
- func (e *execProcess) start(ctx context.Context) (err error) {
 - var (
 - socket *runc.Socket
 - pidfile = filepath.Join(e.path, fmt.Sprintf("%s.pid", e.id))
 - )
 - if e.stdio.Terminal {
 - if socket, err = runc.NewTempConsoleSocket(); err != nil {
 - return errors.Wrap(err, "failed to create runc console socket")
 - }
 - defer socket.Close()
 - } else if e.stdio.IsNull() {
 - if e.io, err = runc.NewNullIO(); err != nil {
 - return errors.Wrap(err, "creating new NULL IO")
 - }
 - } else {
 - if e.io, err = runc.NewPipeIO(e.parent.IoUID, e.parent.IoGID, withConditionalIO(e.stdio)); err != nil {
 - return errors.Wrap(err, "failed to create runc io pipes")
 - }
 - }
 - opts := &runc.ExecOpts{
 - PidFile: pidfile,
 - IO: e.io,
 - Detach: true,
 - }
 - if socket != nil {
 - opts.ConsoleSocket = socket
 - }
 - // err 返回了 exec failed: EOF: unknown
 - // 這里的 runtime 就是 runc 的二進(jìn)制文件執(zhí)行命令
 - if err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != nil {
 - close(e.waitBlock)
 - return e.parent.runtimeError(err, "OCI runtime exec failed")
 - }
 
Exec 的代碼如下:
- // Exec executres and additional process inside the container based on a full
 - // OCI Process specification
 - func (r *Runc) Exec(context context.Context, id string, spec specs.Process, opts *ExecOpts) error {
 - f, err := ioutil.TempFile(os.Getenv("XDG_RUNTIME_DIR"), "runc-process")
 - if err != nil {
 - return err
 - }
 - defer os.Remove(f.Name())
 - err = json.NewEncoder(f).Encode(spec)
 - f.Close()
 - if err != nil {
 - return err
 - }
 - args := []string{"exec", "--process", f.Name()}
 - if opts != nil {
 - oargs, err := opts.args()
 - if err != nil {
 - return err
 - }
 - args = append(args, oargs...)
 - }
 - cmd := r.command(context, append(args, id)...)
 - if opts != nil && opts.IO != nil {
 - opts.Set(cmd)
 - }
 - if cmd.Stdout == nil && cmd.Stderr == nil {
 - data, err := cmdOutput(cmd, true)
 - if err != nil {
 - return fmt.Errorf("%s: %s", err, data)
 - }
 - return nil
 - }
 - ec, err := Monitor.Start(cmd)
 - if err != nil {
 - return err
 - }
 - if opts != nil && opts.IO != nil {
 - if c, ok := opts.IO.(StartCloser); ok {
 - if err := c.CloseAfterStart(); err != nil {
 - return err
 - }
 - }
 - }
 - status, err := Monitor.Wait(cmd, ec)
 - if err == nil && status != 0 {
 - err = fmt.Errorf("%s did not terminate sucessfully", cmd.Args[0])
 - }
 - return err
 - }
 
因此是 runc 在運行后輸出了 exec failed: EOF: unknown 這個錯誤。
將 runc 指令循環(huán)執(zhí)行,可少量復(fù)現(xiàn)。經(jīng)過排查,發(fā)現(xiàn) runc exec 在運行期間會讀取 container 的 state.json,并使用 json decode 時出現(xiàn)異常。
此時聯(lián)想到開啟 kubelet cpu-manager 后,會 update container,也就是更新這個 state.json 文件。導(dǎo)致 runc 讀到了部分 cpu-manager 更新的內(nèi)容。從而導(dǎo)致 json decode 失敗。此時排查 runc EOF 和 kubelet cpu-manager update container(默認(rèn)每 10s 更新一次) 的時間,發(fā)現(xiàn)時間點剛好吻合,驗證猜想。
查看 runc 是否有修復(fù),發(fā)現(xiàn)了這個 pr: https://github.com/opencontainers/runc/pull/2467。 修復(fù)思路是將 saveState 變成原子操作,這樣就不會出現(xiàn)讀取 state.json 時,讀到部分寫入的內(nèi)容,導(dǎo)致 unexpected EOF (或 EOF)的問題
- // 原來的
 - func (c *linuxContainer) saveState(s *State) error {
 - f, err := os.Create(filepath.Join(c.root, stateFilename))
 - if err != nil {
 - return err
 - }
 - defer f.Close()
 - return utils.WriteJSON(f, s)
 - }
 - // 修復(fù)后的
 - func (c *linuxContainer) saveState(s *State) (retErr error) {
 - tmpFile, err := ioutil.TempFile(c.root, "state-")
 - if err != nil {
 - return err
 - }
 - defer func() {
 - if retErr != nil {
 - tmpFile.Close()
 - os.Remove(tmpFile.Name())
 - }
 - }()
 - err = utils.WriteJSON(tmpFile, s)
 - if err != nil {
 - return err
 - }
 - err = tmpFile.Close()
 - if err != nil {
 - return err
 - }
 - stateFilePath := filepath.Join(c.root, stateFilename)
 - return os.Rename(tmpFile.Name(), stateFilePath)
 - }
 
五、解決
關(guān)閉cpu-manager
升級runc























 
 
 












 
 
 
 