代码之家  ›  专栏  ›  技术社区  ›  Lakshya Garg

退出重新启动elasticsearch kubernetes吊舱失败的容器的原因是什么?

  •  2
  • Lakshya Garg  · 技术社区  · 6 年前

    kubectl get pods

    NAME                  READY     STATUS    RESTARTS   AGE
    es-764bd45bb6-w4ckn   0/1       Error     4          3m
    

    以下是 kubectl describe pod

    Name:           es-764bd45bb6-w4ckn
    Namespace:      default
    Node:           administrator-thinkpad-l480/<node_ip>
    Start Time:     Thu, 30 Aug 2018 16:38:08 +0530
    Labels:         io.kompose.service=es
                pod-template-hash=3206801662
    Annotations:    <none> 
    Status:         Running
    IP:             10.32.0.8
    Controlled By:  ReplicaSet/es-764bd45bb6
    Containers:
    es:
    Container ID:   docker://9be2f7d6eb5d7793908852423716152b8cefa22ee2bb06fbbe69faee6f6aa3c3
    Image:          docker.elastic.co/elasticsearch/elasticsearch:6.2.4
    Image ID:       docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:9ae20c753f18e27d1dd167b8675ba95de20b1f1ae5999aae5077fa2daf38919e
    Port:           9200/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    78
      Started:      Thu, 30 Aug 2018 16:42:56 +0530
      Finished:     Thu, 30 Aug 2018 16:43:07 +0530
    Ready:          False
    Restart Count:  5
    Environment:
      ELASTICSEARCH_ADVERTISED_HOST_NAME:  es
      ES_JAVA_OPTS:                        -Xms2g -Xmx2g
      ES_HEAP_SIZE:                        2GB
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-nhb9z (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      default-token-nhb9z:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-nhb9z
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age               From           Message
      ----     ------     ----              ----           -------
     Normal   Scheduled  6m                default-scheduler                     Successfully assigned default/es-764bd45bb6-w4ckn to administrator-thinkpad-l480
     Normal   Pulled     3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Container image "docker.elastic.co/elasticsearch/elasticsearch:6.2.4" already present on machine
     Normal   Created    3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Created container
     Normal   Started    3m (x5 over 6m)   kubelet, administrator-thinkpad-l480  Started container
     Warning  BackOff    1m (x15 over 5m)  kubelet, administrator-thinkpad-l480  Back-off restarting failed container
    

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.1.0 (36652f6)
      creationTimestamp: null
      labels:
        io.kompose.service: es
      name: es
    spec:
      replicas: 1
      strategy: {}
      template:
        metadata:
          creationTimestamp: null
          labels:
            io.kompose.service: es
        spec:
          containers:
          - env:
            - name: ELASTICSEARCH_ADVERTISED_HOST_NAME
              value: es
            - name: ES_JAVA_OPTS
              value: -Xms2g -Xmx2g
            - name: ES_HEAP_SIZE
              value: 2GB
            image: docker.elastic.co/elasticsearch/elasticsearch:6.2.4
            name: es
            ports:
            - containerPort: 9200
            resources: {}
          restartPolicy: Always
     status: {}
    

    当我尝试使用 kubectl logs -f es-764bd45bb6-w4ckn ,我明白了

    Error from server: Get https://<slave node ip>:10250/containerLogs/default/es-764bd45bb6-w4ckn/es?previous=true: dial tcp <slave node ip>:10250: i/o timeout 
    

    这个问题的原因和解决办法是什么?

    2 回复  |  直到 4 年前
        1
  •  11
  •   Pradeep    6 年前

    我也有同样的问题,这个问题可能有两个原因。就我而言,jar文件不见了。@Lakshya已经回答了这个问题,我想补充一下你可以采取的步骤来解决它。

    1. 获取吊舱状态,命令-
    2. 最后几行输出提供事件和部署失败的位置
    3. 获取日志以获取更多详细信息-
    4. 获取容器日志- kubectl记录“pod name”-c“容器名” 从describe pod命令的输出中获取容器名称

    如果你的集装箱在上面,你可以用 kubectl执行官-it 命令进一步分析容器

        2
  •  0
  •   Lakshya Garg    6 年前

    我发现日志是用 docker logs ,并发现es没有启动,因为 vm.max_map_count 我换了衣服 vm.max\u映射\u计数 使用 sysctl -w vm.max_map_count=262144