代码之家  ›  专栏  ›  技术社区  ›  Felipe

如何在Kubernetes中配置Flink TaskManagers部署的多个副本的静态主机名,并在Prometheus ConfigMap中获取它?

  •  0
  • Felipe  · 技术社区  · 4 年前

    我有一个flink JobManager,其中只有一个TaskManager在Kubernetes之上运行。为此,我使用了 Service 以及a Deployment 对于TaskManager replicas: 1 .

    apiVersion: v1
    kind: Service
    metadata:
      name: flink-taskmanager
    spec:
      type: ClusterIP
      ports:
      - name: prometheus
        port: 9250
      selector:
        app: flink
        component: taskmanager
    

    这个 部署 :

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: flink-taskmanager
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: flink
          component: taskmanager
      template:
        metadata:
          labels:
            app: flink
            component: taskmanager
        spec:
          hostname: flink-taskmanager
          volumes:
          - name: flink-config-volume
            configMap:
              name: flink-config
              items:
              - key: flink-conf.yaml
                path: flink-conf.yaml
              - key: log4j-console.properties
                path: log4j-console.properties
          - name: tpch-dbgen-data
            persistentVolumeClaim:
              claimName: tpch-dbgen-data-pvc
          - name: tpch-dbgen-datarate
            persistentVolumeClaim:
              claimName: tpch-dbgen-datarate-pvc
          containers:
          - name: taskmanager
            image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
            # imagePullPolicy: Always
            env:
            args: ["taskmanager"]
            ports:
            - containerPort: 6122
              name: rpc
            - containerPort: 6125
              name: query-state
            - containerPort: 9250
            livenessProbe:
              tcpSocket:
                port: 6122
              initialDelaySeconds: 30
              periodSeconds: 60
            volumeMounts:
            - name: flink-config-volume
              mountPath: /opt/flink/conf/
            - name: tpch-dbgen-data
              mountPath: /opt/tpch-dbgen/data
              subPath: data
            - mountPath: /tmp
              name: tpch-dbgen-datarate
              subPath: tmp
            securityContext:
              runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
    

    然后,我将数据从Flink TaskManager交换到Prometheus,并使用一个 售后服务 , ConfigMap ,以及 部署 将Prometheus设置在Kubernetes之上,并使其从Flink任务管理器中获取数据。

    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus-service
    spec:
      type: ClusterIP
      ports:
      - name: promui
        protocol: TCP
        port: 9090
        targetPort: 9090
      selector:
        app: flink
        component: prometheus
    

    这个 配置映射 是我设置Flink任务管理器主机的地方 - targets: ['flink-jobmanager:9250', 'flink-jobmanager:9251', 'flink-taskmanager:9250'] 与Kubernetes对象匹配 售后服务 对于Flink( flink-taskmanager ):

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      labels:
        app: flink
    data:
      prometheus.yml: |+
        global:
          scrape_interval: 15s
    
        scrape_configs:
          - job_name: 'prometheus'
            scrape_interval: 5s
            static_configs:
              - targets: ['localhost:9090']
          - job_name: 'flink'
            scrape_interval: 5s
            static_configs:
              - targets: ['flink-jobmanager:9250', 'flink-jobmanager:9251', 'flink-taskmanager:9250']
            metrics_path: /
    

    这个 部署 :

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus-deployment
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: flink
          component: prometheus
      template:
        metadata:
          labels:
            app: flink
            component: prometheus
        spec:
          hostname: prometheus
          volumes:
          - name: prometheus-config-volume
            configMap:
              name: prometheus-config
              items:
              - key: prometheus.yml
                path: prometheus.yml
          containers:
          - name: prometheus
            image: prom/prometheus
            ports:
            - containerPort: 9090
            volumeMounts:
              - name: prometheus-config-volume
                mountPath: /etc/prometheus/prometheus.yml
                subPath: prometheus.yml
    

    这工作得很好,我可以在Prometheus WEB-UI上查询Flink任务管理器的数据。但是,一旦我改变了 副本:1 replicas: 3 例如,我无法再从任务管理器中查询数据。我想这是因为配置 -目标:[链接-作业经理:9250,flink作业经理:92.51,flink任务经理:9250] 当Flink TaskManagers的副本更多时,它将不再有效。但是,由于是Kubernetes管理新TaskManager副本的创建,我不知道在Prometheus的这个选项上配置什么。我想它应该是动态的,或者带有*或一些正则表达式,可以为我获取所有任务管理器。有人知道如何配置它吗?

    0 回复  |  直到 4 年前
        1
  •  1
  •   Felipe    4 年前

    我必须根据这个答案来解决这个问题 https://stackoverflow.com/a/55139221/2096986 以及 documentation 首先,我必须使用 StatefulSet 而不是 Deployment 。有了这个,我可以将Pod IP设置为有状态。不清楚的是,我必须设置 Service 使用 clusterIP: None 而不是 type: ClusterIP 。以下是我的服务:

    apiVersion: v1
    kind: Service
    metadata:
      name: flink-taskmanager
      labels:
        app: flink-taskmanager
    spec:
      clusterIP: None # type: ClusterIP
      ports:
      - name: prometheus
        port: 9250
      selector:
        app: flink-taskmanager
    

    这是我的 状态集 :

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: flink-taskmanager
    spec:
      replicas: 3
      serviceName: flink-taskmanager
      selector:
        matchLabels:
          app: flink-taskmanager # has to match .spec.template.metadata.labels
      template:
        metadata:
          labels:
            app: flink-taskmanager # has to match .spec.selector.matchLabels
        spec:
          hostname: flink-taskmanager
          volumes:
          - name: flink-config-volume
            configMap:
              name: flink-config
              items:
              - key: flink-conf.yaml
                path: flink-conf.yaml
              - key: log4j-console.properties
                path: log4j-console.properties
          - name: tpch-dbgen-data
            persistentVolumeClaim:
              claimName: tpch-dbgen-data-pvc
          - name: tpch-dbgen-datarate
            persistentVolumeClaim:
              claimName: tpch-dbgen-datarate-pvc
          containers:
          - name: taskmanager
            image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
            # imagePullPolicy: Always
            env:
            args: ["taskmanager"]
            ports:
            - containerPort: 6122
              name: rpc
            - containerPort: 6125
              name: query-state
            - containerPort: 9250
            livenessProbe:
              tcpSocket:
                port: 6122
              initialDelaySeconds: 30
              periodSeconds: 60
            volumeMounts:
            - name: flink-config-volume
              mountPath: /opt/flink/conf/
            - name: tpch-dbgen-data
              mountPath: /opt/tpch-dbgen/data
              subPath: data
            - mountPath: /tmp
              name: tpch-dbgen-datarate
              subPath: tmp
            securityContext:
              runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
    

    以及prometheus配置文件 prometheus.yml 我用图案标出了主人 StatefulSetName-{0..N-1}.ServiceName.default.svc.cluster.local :

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      labels:
        app: flink
    data:
      prometheus.yml: |+
        global:
          scrape_interval: 15s
    
        scrape_configs:
          - job_name: 'prometheus'
            scrape_interval: 5s
            static_configs:
              - targets: ['localhost:9090']
          - job_name: 'flink'
            scrape_interval: 5s
            static_configs:
              - targets: ['flink-jobmanager:9250', 'flink-jobmanager:9251', 'flink-taskmanager-0.flink-taskmanager.default.svc.cluster.local:9250', 'flink-taskmanager-1.flink-taskmanager.default.svc.cluster.local:9250', 'flink-taskmanager-2.flink-taskmanager.default.svc.cluster.local:9250']
            metrics_path: /