Kubernetes Pod 突然就無(wú)法掛載 Ceph RBD 存儲(chǔ)卷了.....
本文轉(zhuǎn)載自微信公眾號(hào)「云原生實(shí)驗(yàn)室」,作者米開(kāi)朗基楊 。轉(zhuǎn)載本文請(qǐng)聯(lián)系云原生實(shí)驗(yàn)室公眾號(hào)。
前言
Kubernetes 坑不坑?坑!Ceph 坑不坑?坑!他倆湊到一起呢?巨坑!
之前在 Kubernetes 集群中部署了高可用 Harbor 鏡像倉(cāng)庫(kù),并使用 Ceph RBD 提供持久化存儲(chǔ)。本來(lái)是挺美滋滋的,誰(shuí)料昨天有一臺(tái)節(jié)點(diǎn) NotReady 了,導(dǎo)致 Harbor 的某個(gè)組件所在的 Pod 被重新調(diào)度了,但是重新調(diào)度后的 Pod 并沒(méi)有啟動(dòng)成功。
進(jìn)一步通過(guò) describe pod 查看 events,發(fā)現(xiàn)如下 Warning:
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Normal Scheduled 23s default-scheduler Successfully assigned harbor/harbor-harbor-registry-5796cdddd7-kxzp9 to k8s03
- Warning FailedAttachVolume 22s attachdetach-controller Multi-Attach error for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" Volume is already exclusively attached to one node and can't be attached to another
好家伙,當(dāng)前的 PV 所對(duì)應(yīng)的 RBD image 還在被另一個(gè) Pod 占用著,所以無(wú)法掛載到新 Pod 中。我到 NotReady 的節(jié)點(diǎn)中通過(guò) docker rm -vf xxx 直接將之前的 Pod 刪除,仍然不起作用。
現(xiàn)在看來(lái)我只能從之前的 Pod 所在節(jié)點(diǎn)中將 RBD image 映射的塊設(shè)備強(qiáng)行 unmount 了。首先得找到該 PV 所對(duì)應(yīng)的 RBD image,直接查看 PV 的信息:
- 🐳 → kubectl -n harbor get pv pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3 -o go-template='{{.spec.csi.volumeAttributes.imageName}}'
- csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c
到 Ceph 管理節(jié)點(diǎn)中查看該 image 正在被誰(shuí)使用:
- 🐳 → rbd status kubernetes/csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c
- Watchers:
- watcher=172.16.7.1:0/3619044864 client.195600 cookie=18446462598732840980
找到了罪魁禍?zhǔn)?,于是登錄?172.16.7.1 將塊設(shè)備強(qiáng)行卸載:
- 🐳 → docker ps|grep csi
- 77255fe4f26b 650757c4f32d "/usr/local/bin/ceph…" 3 weeks ago Up 3 weeks k8s_liveness-prometheus_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
- fb4e5e10f064 650757c4f32d "/usr/local/bin/ceph…" 3 weeks ago Up 3 weeks k8s_csi-rbdplugin_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
- 5330c84529e9 37c1d9ea538b "/csi-node-driver-re…" 3 weeks ago Up 3 weeks k8s_driver-registrar_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_6
- 4452755ffccf k8s.gcr.io/pause:3.2 "/pause" 3 weeks ago Up 3 weeks k8s_POD_csi-rbdplugin-hscf8_ceph-csi_2b7da817-3f4a-4e8f-9f99-a39da07c5b94_5
- 🐳 → docker exec -it fb4e5e10f064 bash
- [root@k8s01 /]# rbd showmapped|grep csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c
- 4 kubernetes csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c - /dev/rbd4
- [root@k8s01 /]# rbd unmap -o force /dev/rbd4
現(xiàn)在在來(lái)看新 Pod,已經(jīng)啟動(dòng)成功了:
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Normal Scheduled 18m default-scheduler Successfully assigned harbor/harbor-harbor-registry-5796cdddd7-kxzp9 to k8s03
- Warning FailedAttachVolume 18m attachdetach-controller Multi-Attach error for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" Volume is already exclusively attached to one node and can't be attached to another
- Warning FailedMount 14m kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[default-token-phjbz registry-data registry-root-certificate registry-htpasswd registry-config]: timed out waiting for the condition
- Normal SuccessfulAttachVolume 12m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3"
- Warning FailedMount 12m kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-htpasswd registry-config default-token-phjbz registry-data registry-root-certificate]: timed out waiting for the condition
- Warning FailedMount 5m21s (x2 over 16m) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-config default-token-phjbz registry-data registry-root-certificate registry-htpasswd]: timed out waiting for the condition
- Warning FailedMount 3m5s (x2 over 9m55s) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-root-certificate registry-htpasswd registry-config default-token-phjbz registry-data]: timed out waiting for the condition
- Warning FailedMount 2m54s (x9 over 11m) kubelet, k8s03 MountVolume.MountDevice failed for volume "pvc-ec045b5e-2471-469d-9a1b-6e7db0e938b3" : rpc error: code = Internal desc = rbd image kubernetes/csi-vol-bf0dc641-4a5a-11eb-988c-6ab597a1411c is still being used
- Warning FailedMount 50s (x2 over 7m39s) kubelet, k8s03 Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-data registry-root-certificate registry-htpasswd registry-config default-token-phjbz]: timed out waiting for the condition
- Normal Pulling 15s kubelet, k8s03 Pulling image "goharbor/registry-photon:v2.1.2"
- Normal Pulled 12s kubelet, k8s03 Successfully pulled image "goharbor/registry-photon:v2.1.2"
- Normal Created 12s kubelet, k8s03 Created container registry
- Normal Started 12s kubelet, k8s03 Started container registry






























