代码之家 › 专栏 › 技术社区 › Raman

Kubernetes群集已损坏:FailedSync和SandboxChanged

kubernetes

Raman · 技术社区 · 7 年前

我有一个Kubernetes 1.7.5集群,不知何故进入了半破坏状态。在此集群上计划新部署部分失败:1/2个pod正常启动,但第二个pod未启动。活动包括:

default   2017-09-28 03:57:02 -0400 EDT   2017-09-28 03:57:02 -0400 EDT   1         hello-4059723819-8s35v   Pod       spec.containers{hello}   Normal    Pulled    kubelet, k8s-agentpool1-18117938-2   Successfully pulled image "myregistry.azurecr.io/mybiz/hello"
default   2017-09-28 03:57:02 -0400 EDT   2017-09-28 03:57:02 -0400 EDT   1         hello-4059723819-8s35v   Pod       spec.containers{hello}   Normal    Created   kubelet, k8s-agentpool1-18117938-2   Created container
default   2017-09-28 03:57:03 -0400 EDT   2017-09-28 03:57:03 -0400 EDT   1         hello-4059723819-8s35v   Pod       spec.containers{hello}   Normal    Started   kubelet, k8s-agentpool1-18117938-2   Started container
default   2017-09-28 03:57:13 -0400 EDT   2017-09-28 03:57:01 -0400 EDT   2         hello-4059723819-tj043   Pod                 Warning   FailedSync   kubelet, k8s-agentpool1-18117938-3   Error syncing pod
default   2017-09-28 03:57:13 -0400 EDT   2017-09-28 03:57:02 -0400 EDT   2         hello-4059723819-tj043   Pod                 Normal    SandboxChanged   kubelet, k8s-agentpool1-18117938-3   Pod sandbox changed, it will be killed and re-created.
default   2017-09-28 03:57:24 -0400 EDT   2017-09-28 03:57:01 -0400 EDT   3         hello-4059723819-tj043   Pod                 Warning   FailedSync   kubelet, k8s-agentpool1-18117938-3   Error syncing pod
default   2017-09-28 03:57:25 -0400 EDT   2017-09-28 03:57:02 -0400 EDT   3         hello-4059723819-tj043   Pod                 Normal    SandboxChanged   kubelet, k8s-agentpool1-18117938-3   Pod sandbox changed, it will be killed and re-created.
[...]

最终,仪表板显示错误:

Error: failed to start container "hello": Error response from daemon: {"message":"cannot join network of a non running container: 7e95918c6b546714ae20f12349efcc6b4b5b9c1e84b5505cf907807efd57525c"}

--runtime-config=batch/v2alpha1=true 为了使用 CronJob

节点上的kubelet日志显示无法分配IP地址:

E0928 20:54:01.733682    1750 pod_workers.go:182] Error syncing pod 65127a94-a425-11e7-8d64-000d3af4357e ("hello-4059723819-xx16n_default(65127a94-a425-11e7-8d64-000d3af4357e)"), skipping: failed to "CreatePodSandbox" for "hello-4059723819-xx16n_default(65127a94-a425-11e7-8d64-000d3af4357e)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hello-4059723819-xx16n_default(65127a94-a425-11e7-8d64-000d3af4357e)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"hello-4059723819-xx16n_default\" network: Failed to allocate address: Failed to delegate: Failed to allocate address: No available addresses"

1 回复 | 直到 7 年前

Raman 7 年前

这是Azure CNI的一个错误,它不总是正确地从终止的pod中回收IP地址。请参阅本期: https://github.com/Azure/azure-container-networking/issues/76

启用后发生这种情况的原因 CronJob

推荐文章

ralonr · 当上下文已经设置好时,如何在K9中的上下文之间切换?

2 年前

Dr. Andrew · kubectl运行时未创建部署

2 年前

Meghana B Srinath · ServiceNow作为基于容器的应用程序

2 年前

CodeMonkey · 键“meta.helm.sh/release name”必须等于“x”:当前值为“y”

2 年前

user17377017 · Kubernetes nginx控制器错误日志:连接到上游时connect()失败(111:连接被拒绝),客户端:X.X.X.X,服务器:X.X.X.X

2 年前

Suhasini Subramaniam · 如何列出库伯内特斯吊舱中有CharDevice?

2 年前

briadeus · Kubernetes活性/就绪性探测在同一集群上的不同命名空间中失败(权限被拒绝)

2 年前

Abhishek Rai · 库伯内特斯吊舱卡在集装箱内

2 年前

Rajan Subramanian · 无法让kubectl在minikube中运行flask应用程序

3 年前

TiDu · 使用EKS设置出口网关的最简单方法,无需Istio

3 年前