Comment réparer «l'exécution du conteneur est en panne, le PLEG n'est pas sain»

Question

J'ai aks avec un cluster kubernetes ayant 2 nœuds. Chaque nœud a environ 6-7 pod fonctionnant avec 2 conteneurs pour chaque pod. Un conteneur est mon image docker et l'autre est créé par istio pour son maillage de service. Mais après environ 10 heures, les nœuds deviennent "non prêts" et le nœud décrit me montre 2 erreurs: 1. l'exécution du conteneur est en panne, le PLEG n'est pas sain: pleg était en dernier actif il y a 1h32m35.942907195s; le seuil est de 3m0s. Erreur 2.rpc: code = DeadlineExceeded desc = délai de contexte dépassé, impossible de se connecter au démon Docker sous unix: ///var/run/docker.sock. Le démon docker est-il en cours d'exécution?

Lorsque je redémarre le nœud, cela fonctionne bien, mais le nœud revient à "PAS PRÊT" après un certain temps. A commencé à faire face à ce problème depuis l'ajout d'istio, mais n'a trouvé aucun document les concernant. La prochaine étape consiste à essayer de mettre à niveau kubernetes

Le nœud décrit le journal:

Name: aks-agentpool-22124581-0 Roles: agent Labels: agentpool=agentpool beta.kubernetes.io/Arch=AMD64 beta.kubernetes.io/instance-type=Standard_B2s beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eastus failure-domain.beta.kubernetes.io/zone=1 kubernetes.Azure.com/cluster=MC_XXXXXXXXX kubernetes.io/hostname=aks-XXXXXXXXX kubernetes.io/role=agent node-role.kubernetes.io/agent= storageprofile=managed storagetier=Premium_LRS Annotations: aks.Microsoft.com/remediated=3 node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true CreationTimestamp: Thu, 25 Oct 2018 14:46:53 +0000 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Thu, 25 Oct 2018 14:49:06 +0000 Thu, 25 Oct 2018 14:49:06 +0000 RouteCreated RouteController created a route OutOfDisk False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 19 Dec 2018 19:28:55 +0000 Thu, 25 Oct 2018 14:46:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletNotReady container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s Addresses: Hostname: aks-XXXXXXXXX Capacity: cpu: 2 ephemeral-storage: 30428648Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 4040536Ki pods: 110 Allocatable: cpu: 1940m ephemeral-storage: 28043041951 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3099480Ki pods: 110 System Info: Machine ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX System UUID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Kernel Version: 4.15.0-1035-Azure OS Image: Ubuntu 16.04.5 LTS Operating System: linux Architecture: AMD64 Container Runtime Version: docker://Unknown Kubelet Version: v1.11.3 Kube-Proxy Version: v1.11.3 PodCIDR: 10.244.0.0/24 ProviderID: Azure:///subscriptions/9XXXXXXXXXXX/resourceGroups/MC_XXXXXXXXXXXXXXXXXXXXXXXXXXXX/providers/Microsoft.Compute/virtualMachines/aks-XXXXXXXXXXXX Non-terminated Pods: (42 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default emailgistics-graph-monitor-6477568564-q98p2 10m (0%) 0 (0%) 0 (0%) 0 (0%) default emailgistics-message-handler-7df4566b6f-mh255 10m (0%) 0 (0%) 0 (0%) 0 (0%) default emailgistics-reports-aggregator-5fd96b94cb-b5vbn 10m (0%) 0 (0%) 0 (0%) 0 (0%) default emailgistics-rules-844b77f46-5lrkw 10m (0%) 0 (0%) 0 (0%) 0 (0%) default emailgistics-scheduler-754884b566-mwgvp 10m (0%) 0 (0%) 0 (0%) 0 (0%) default emailgistics-subscription-token-manager-7974558985-f2t49 10m (0%) 0 (0%) 0 (0%) 0 (0%) default mollified-kiwi-cert-manager-665c5d9c8c-2ld59 0 (0%) 0 (0%) 0 (0%) 0 (0%) istio-system grafana-59b787b9b-dzdtc 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-citadel-5d8956cc6-x55vk 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-egressgateway-f48fc7fbb-szpwp 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-galley-6975b6bd45-g7lsc 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-ingressgateway-c6c4bcdbf-bbgcw 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-pilot-d9b5b9b7c-ln75n 510m (26%) 0 (0%) 2Gi (67%) 0 (0%) istio-system istio-policy-6b465cd4bf-92l57 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-policy-6b465cd4bf-b2z85 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-policy-6b465cd4bf-j59r4 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-policy-6b465cd4bf-s9pdm 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-sidecar-injector-575597f5cf-npkcz 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-6944cd768-9794j 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-6944cd768-g7gh5 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-6944cd768-Gd88n 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-6944cd768-px8qb 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-telemetry-6944cd768-xzslh 20m (1%) 0 (0%) 0 (0%) 0 (0%) istio-system istio-tracing-7596597bd7-hjtq2 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system prometheus-76db5fddd5-d6dxs 10m (0%) 0 (0%) 0 (0%) 0 (0%) istio-system servicegraph-758f96bf5b-c9sqk 10m (0%) 0 (0%) 0 (0%) 0 (0%) kube-system addon-http-application-routing-default-http-backend-5ccb95zgfm8 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%) kube-system addon-http-application-routing-external-dns-59d8698886-h8xds 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system addon-http-application-routing-nginx-ingress-controller-ff49qc7 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system heapster-5d6f9b846c-m4kfp 130m (6%) 130m (6%) 230Mi (7%) 230Mi (7%) kube-system kube-dns-v20-7c7d7d4c66-qqkfm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%) kube-system kube-dns-v20-7c7d7d4c66-wrxjm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%) kube-system kube-proxy-2tb68 100m (5%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-svc-redirect-d6gqm 10m (0%) 0 (0%) 34Mi (1%) 0 (0%) kube-system kubernetes-dashboard-68f468887f-l9x46 100m (5%) 100m (5%) 50Mi (1%) 300Mi (9%) kube-system metrics-server-5cbc77f79f-x55cs 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system omsagent-mhrqm 50m (2%) 150m (7%) 150Mi (4%) 300Mi (9%) kube-system omsagent-rs-d688cdf68-pjpmj 50m (2%) 150m (7%) 100Mi (3%) 500Mi (16%) kube-system tiller-deploy-7f4974b9c8-flkjm 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system tunnelfront-7f766dd857-kgqps 10m (0%) 0 (0%) 64Mi (2%) 0 (0%) kube-systems-dev nginx-ingress-dev-controller-7f78f6c8f9-csct4 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-systems-dev nginx-ingress-dev-default-backend-95fbc75b7-lq9tw 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1540m (79%) 540m (27%) memory 2976Mi (98%) 1790Mi (59%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ContainerGCFailed 48m (x43 over 19h) kubelet, aks-agentpool-22124581-0 rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning ImageGCFailed 29m (x57 over 18h) kubelet, aks-agentpool-22124581-0 failed to get image stats: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Warning ContainerGCFailed 2m (x237 over 18h) kubelet, aks-agentpool-22124581-0 rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Fichier de déploiement général:

apiVersion: extensions/v1beta1 kind: Deployment metadata: creationTimestamp: null name: emailgistics-pod spec: minReadySeconds: 10 replicas: 1 strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: annotations: sidecar.istio.io/status: '{"version":"ebf16d3ea0236e4b5cb4d3fc0f01da62e2e6265d005e58f8f6bd43a4fb672fdd","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}' creationTimestamp: null labels: app: emailgistics-pod spec: containers: - image: xxxxxxxxxxxxxxxxxxxxx/emailgistics_pod:xxxxxx imagePullPolicy: Always name: emailgistics-pod ports: - containerPort: 80 resources: {} - args: - proxy - sidecar - --configPath - /etc/istio/proxy - --binaryPath - /usr/local/bin/envoy - --serviceCluster - emailgistics-pod - --drainDuration - 45s - --parentShutdownDuration - 1m0s - --discoveryAddress - istio-pilot.istio-system:15005 - --discoveryRefreshDelay - 1s - --zipkinAddress - zipkin.istio-system:9411 - --connectTimeout - 10s - --proxyAdminPort - "15000" - --controlPlaneAuthPolicy - MUTUAL_TLS env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: INSTANCE_IP valueFrom: fieldRef: fieldPath: status.podIP - name: ISTIO_META_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: ISTIO_META_INTERCEPTION_MODE value: REDIRECT - name: ISTIO_METAJSON_LABELS value: | {"app":"emailgistics-pod"} image: docker.io/istio/proxyv2:1.0.4 imagePullPolicy: IfNotPresent name: istio-proxy ports: - containerPort: 15090 name: http-envoy-prom protocol: TCP resources: requests: cpu: 10m securityContext: readOnlyRootFilesystem: true runAsUser: 1337 volumeMounts: - mountPath: /etc/istio/proxy name: istio-envoy - mountPath: /etc/certs/ name: istio-certs readOnly: true imagePullSecrets: - name: ga.secretname initContainers: - args: - -p - "15001" - -u - "1337" - -m - REDIRECT - -i - '*' - -x - "" - -b - "80" - -d - "" image: docker.io/istio/proxy_init:1.0.4 imagePullPolicy: IfNotPresent name: istio-init resources: {} securityContext: capabilities: add: - NET_ADMIN privileged: true volumes: - emptyDir: medium: Memory name: istio-envoy - name: istio-certs secret: optional: true secretName: istio.default status: {} ---

VKR · Answer

Il s'agit actuellement d'un bug connu et aucun correctif réel n'a été créé pour normaliser le comportement des nœuds. Inspectez les URL ci-dessous:

https://github.com/kubernetes/kubernetes/issues/45419

https://github.com/kubernetes/kubernetes/issues/61117

https://github.com/Azure/AKS/issues/102

J'espère que nous aurons bientôt une solution.