J'ai aks avec un cluster kubernetes ayant 2 nœuds. Chaque nœud a environ 6-7 pod fonctionnant avec 2 conteneurs pour chaque pod. Un conteneur est mon image docker et l'autre est créé par istio pour son maillage de service. Mais après environ 10 heures, les nœuds deviennent "non prêts" et le nœud décrit me montre 2 erreurs: 1. l'exécution du conteneur est en panne, le PLEG n'est pas sain: pleg était en dernier actif il y a 1h32m35.942907195s; le seuil est de 3m0s. Erreur 2.rpc: code = DeadlineExceeded desc = délai de contexte dépassé, impossible de se connecter au démon Docker sous unix: ///var/run/docker.sock. Le démon docker est-il en cours d'exécution?
Lorsque je redémarre le nœud, cela fonctionne bien, mais le nœud revient à "PAS PRÊT" après un certain temps. A commencé à faire face à ce problème depuis l'ajout d'istio, mais n'a trouvé aucun document les concernant. La prochaine étape consiste à essayer de mettre à niveau kubernetes
Le nœud décrit le journal:
Name: aks-agentpool-22124581-0
Roles: agent
Labels: agentpool=agentpool
beta.kubernetes.io/Arch=AMD64
beta.kubernetes.io/instance-type=Standard_B2s
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eastus
failure-domain.beta.kubernetes.io/zone=1
kubernetes.Azure.com/cluster=MC_XXXXXXXXX
kubernetes.io/hostname=aks-XXXXXXXXX
kubernetes.io/role=agent
node-role.kubernetes.io/agent=
storageprofile=managed
storagetier=Premium_LRS
Annotations: aks.Microsoft.com/remediated=3
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 25 Oct 2018 14:46:53 +0000
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 25 Oct 2018 14:49:06 +0000 Thu, 25 Oct 2018 14:49:06 +0000 RouteCreated RouteController created a route
OutOfDisk False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 19 Dec 2018 19:28:55 +0000 Thu, 25 Oct 2018 14:46:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletNotReady container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s
Addresses:
Hostname: aks-XXXXXXXXX
Capacity:
cpu: 2
ephemeral-storage: 30428648Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4040536Ki
pods: 110
Allocatable:
cpu: 1940m
ephemeral-storage: 28043041951
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3099480Ki
pods: 110
System Info:
Machine ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
System UUID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Kernel Version: 4.15.0-1035-Azure
OS Image: Ubuntu 16.04.5 LTS
Operating System: linux
Architecture: AMD64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.11.3
Kube-Proxy Version: v1.11.3
PodCIDR: 10.244.0.0/24
ProviderID: Azure:///subscriptions/9XXXXXXXXXXX/resourceGroups/MC_XXXXXXXXXXXXXXXXXXXXXXXXXXXX/providers/Microsoft.Compute/virtualMachines/aks-XXXXXXXXXXXX
Non-terminated Pods: (42 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default emailgistics-graph-monitor-6477568564-q98p2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-message-handler-7df4566b6f-mh255 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-reports-aggregator-5fd96b94cb-b5vbn 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-rules-844b77f46-5lrkw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-scheduler-754884b566-mwgvp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-subscription-token-manager-7974558985-f2t49 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default mollified-kiwi-cert-manager-665c5d9c8c-2ld59 0 (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system grafana-59b787b9b-dzdtc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-citadel-5d8956cc6-x55vk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-egressgateway-f48fc7fbb-szpwp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-galley-6975b6bd45-g7lsc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-ingressgateway-c6c4bcdbf-bbgcw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-pilot-d9b5b9b7c-ln75n 510m (26%) 0 (0%) 2Gi (67%) 0 (0%)
istio-system istio-policy-6b465cd4bf-92l57 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-b2z85 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-j59r4 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-s9pdm 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-sidecar-injector-575597f5cf-npkcz 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-9794j 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-g7gh5 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-Gd88n 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-px8qb 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-xzslh 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-tracing-7596597bd7-hjtq2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system prometheus-76db5fddd5-d6dxs 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system servicegraph-758f96bf5b-c9sqk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-default-http-backend-5ccb95zgfm8 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%)
kube-system addon-http-application-routing-external-dns-59d8698886-h8xds 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-nginx-ingress-controller-ff49qc7 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system heapster-5d6f9b846c-m4kfp 130m (6%) 130m (6%) 230Mi (7%) 230Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-qqkfm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-wrxjm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-proxy-2tb68 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-svc-redirect-d6gqm 10m (0%) 0 (0%) 34Mi (1%) 0 (0%)
kube-system kubernetes-dashboard-68f468887f-l9x46 100m (5%) 100m (5%) 50Mi (1%) 300Mi (9%)
kube-system metrics-server-5cbc77f79f-x55cs 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system omsagent-mhrqm 50m (2%) 150m (7%) 150Mi (4%) 300Mi (9%)
kube-system omsagent-rs-d688cdf68-pjpmj 50m (2%) 150m (7%) 100Mi (3%) 500Mi (16%)
kube-system tiller-deploy-7f4974b9c8-flkjm 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system tunnelfront-7f766dd857-kgqps 10m (0%) 0 (0%) 64Mi (2%) 0 (0%)
kube-systems-dev nginx-ingress-dev-controller-7f78f6c8f9-csct4 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-systems-dev nginx-ingress-dev-default-backend-95fbc75b7-lq9tw 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1540m (79%) 540m (27%)
memory 2976Mi (98%) 1790Mi (59%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ContainerGCFailed 48m (x43 over 19h) kubelet, aks-agentpool-22124581-0 rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning ImageGCFailed 29m (x57 over 18h) kubelet, aks-agentpool-22124581-0 failed to get image stats: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Warning ContainerGCFailed 2m (x237 over 18h) kubelet, aks-agentpool-22124581-0 rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Fichier de déploiement général:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: emailgistics-pod
spec:
minReadySeconds: 10
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
sidecar.istio.io/status: '{"version":"ebf16d3ea0236e4b5cb4d3fc0f01da62e2e6265d005e58f8f6bd43a4fb672fdd","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: emailgistics-pod
spec:
containers:
- image: xxxxxxxxxxxxxxxxxxxxx/emailgistics_pod:xxxxxx
imagePullPolicy: Always
name: emailgistics-pod
ports:
- containerPort: 80
resources: {}
- args:
- proxy
- sidecar
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- emailgistics-pod
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15005
- --discoveryRefreshDelay
- 1s
- --zipkinAddress
- zipkin.istio-system:9411
- --connectTimeout
- 10s
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- MUTUAL_TLS
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_METAJSON_LABELS
value: |
{"app":"emailgistics-pod"}
image: docker.io/istio/proxyv2:1.0.4
imagePullPolicy: IfNotPresent
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
resources:
requests:
cpu: 10m
securityContext:
readOnlyRootFilesystem: true
runAsUser: 1337
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
imagePullSecrets:
- name: ga.secretname
initContainers:
- args:
- -p
- "15001"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- "80"
- -d
- ""
image: docker.io/istio/proxy_init:1.0.4
imagePullPolicy: IfNotPresent
name: istio-init
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumes:
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
optional: true
secretName: istio.default
status: {}
---
Il s'agit actuellement d'un bug connu et aucun correctif réel n'a été créé pour normaliser le comportement des nœuds. Inspectez les URL ci-dessous:
https://github.com/kubernetes/kubernetes/issues/45419
https://github.com/kubernetes/kubernetes/issues/61117
https://github.com/Azure/AKS/issues/102
J'espère que nous aurons bientôt une solution.