代码之家  ›  专栏  ›  技术社区  ›  javamonkey79

k8s吊舱无法连接到另一个吊舱

  •  0
  • javamonkey79  · 技术社区  · 5 年前

    airflow helm chart 在k8s上运行airflow。但是,web pod似乎无法连接到postgresql。奇怪的是,其他豆荚可以。

    我拼凑了一些小脚本来检查,这就是我发现的:

    [root@ip-10-56-173-248 bin]# cat checkpostgres.sh
    docker exec -u root $1 /bin/nc -zvw2 airflow-postgresql 5432
    [root@ip-10-56-173-248 bin]# docker ps --format '{{.Names}}\t{{.ID}}'|grep k8s_airflow|grep default|awk '{printf("%s ",$1); system("checkpostgres.sh " $2)}'
    k8s_airflow-web_airflow-web-57c6dcd77b-dvjmv_default_67d74586-284b-11ea-8021-0249931777ef_74 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) : Connection timed out
    k8s_airflow-worker_airflow-worker-0_default_67e1703a-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
    k8s_airflow-scheduler_airflow-scheduler-5d9b688ccf-zdjdl_default_67d3fab4-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
    k8s_airflow-postgresql_airflow-postgresql-76c954bb7f-gwq68_default_67d1cf3d-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
    k8s_airflow-redis_airflow-redis-master-0_default_67d9aa36-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (?) open
    k8s_airflow-flower_airflow-flower-79c999764d-d4q58_default_67d267e2-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
    

    这是我的k8s版本信息:

    ➜  ~ kubectl version
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
    

    当我对pod名称进行nslookup时,它似乎工作正常:

    # nslookup airflow-postgresql
    Server:     172.20.0.10
    Address:    172.20.0.10#53
    
    Non-authoritative answer:
    Name:   airflow-postgresql.default.svc.cluster.local
    Address: 172.20.166.209
    

    编辑:根据要求,以下是EKS设置:

    亚马逊-eks-nodegroup.yaml:

    ---
    AWSTemplateFormatVersion: '2010-09-09'
    Description: 'Amazon EKS - Node Group'
    
    Parameters:
    
      KeyName:
        Description: The EC2 Key Pair to allow SSH access to the instances
        Type: AWS::EC2::KeyPair::KeyName
    
      NodeImageId:
        Type: AWS::EC2::Image::Id
        Description: AMI id for the node instances.
    
      NodeInstanceType:
        Description: EC2 instance type for the node instances
        Type: String
        Default: t3.medium
        AllowedValues:
        - t2.small
        - t2.medium
        - t2.large
        - t2.xlarge
        - t2.2xlarge
        - t3.nano
        - t3.micro
        - t3.small
        - t3.medium
        - t3.large
        - t3.xlarge
        - t3.2xlarge
        - m3.medium
        - m3.large
        - m3.xlarge
        - m3.2xlarge
        - m4.large
        - m4.xlarge
        - m4.2xlarge
        - m4.4xlarge
        - m4.10xlarge
        - m5.large
        - m5.xlarge
        - m5.2xlarge
        - m5.4xlarge
        - m5.12xlarge
        - m5.24xlarge
        - c4.large
        - c4.xlarge
        - c4.2xlarge
        - c4.4xlarge
        - c4.8xlarge
        - c5.large
        - c5.xlarge
        - c5.2xlarge
        - c5.4xlarge
        - c5.9xlarge
        - c5.18xlarge
        - i3.large
        - i3.xlarge
        - i3.2xlarge
        - i3.4xlarge
        - i3.8xlarge
        - i3.16xlarge
        - r3.xlarge
        - r3.2xlarge
        - r3.4xlarge
        - r3.8xlarge
        - r4.large
        - r4.xlarge
        - r4.2xlarge
        - r4.4xlarge
        - r4.8xlarge
        - r4.16xlarge
        - x1.16xlarge
        - x1.32xlarge
        - p2.xlarge
        - p2.8xlarge
        - p2.16xlarge
        - p3.2xlarge
        - p3.8xlarge
        - p3.16xlarge
        - r5.large
        - r5.xlarge
        - r5.2xlarge
        - r5.4xlarge
        - r5.12xlarge
        - r5.24xlarge
        - r5d.large
        - r5d.xlarge
        - r5d.2xlarge
        - r5d.4xlarge
        - r5d.12xlarge
        - r5d.24xlarge
        - z1d.large
        - z1d.xlarge
        - z1d.2xlarge
        - z1d.3xlarge
        - z1d.6xlarge
        - z1d.12xlarge
        ConstraintDescription: Must be a valid EC2 instance type
    
      NodeAutoScalingGroupMinSize:
        Type: Number
        Description: Minimum size of Node Group ASG.
        Default: 1
    
      NodeAutoScalingGroupMaxSize:
        Type: Number
        Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
        Default: 4
    
      NodeAutoScalingGroupDesiredCapacity:
        Type: Number
        Description: Desired capacity of Node Group ASG.
        Default: 3
    
      NodeVolumeSize:
        Type: Number
        Description: Node volume size
        Default: 20
    
      ClusterName:
        Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster. i.e. "eks"
        Type: String
    
      Environment:
        Description: the Environment value provided when the cluster was created. i.e. "dev"
        Type: String
    
      BootstrapArguments:
        Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
        Default: ""
        Type: String
    
      VpcId:
        Description: The VPC of the worker instances stack reference
        Type: String
    
      Subnets:
        Description: The subnets where workers can be created.
        Type: String
    
    Metadata:
      AWS::CloudFormation::Interface:
        ParameterGroups:
          -
            Label:
              default: "EKS Cluster"
            Parameters:
              - ClusterName
          -
            Label:
              default: "dev"
            Parameters:
              - Environment
          -
            Label:
              default: "Worker Node Configuration"
            Parameters:
              - NodeAutoScalingGroupMinSize
              - NodeAutoScalingGroupDesiredCapacity
              - NodeAutoScalingGroupMaxSize
              - NodeInstanceType
              - NodeImageId
              - NodeVolumeSize
              - KeyName
              - BootstrapArguments
          -
            Label:
              default: "Worker Network Configuration"
            Parameters:
              - VpcId
              - Subnets
    
    Resources:
    
      NodeInstanceProfile:
        Type: AWS::IAM::InstanceProfile
        Properties:
          InstanceProfileName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-profile"
          Path: "/"
          Roles:
          - !Ref NodeInstanceRole
    
      NodeInstanceRole:
        Type: AWS::IAM::Role
        Properties:
          RoleName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-role"
          AssumeRolePolicyDocument:
            Version: '2012-10-17'
            Statement:
            - Effect: Allow
              Principal:
                Service:
                - ec2.amazonaws.com
              Action:
              - sts:AssumeRole
          Path: "/"
          ManagedPolicyArns:
            - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
            - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
            - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
            - arn:aws:iam::aws:policy/AmazonS3FullAccess
            - arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
          Policies:
            -
              PolicyName: "change-r53-recordsets"
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  -
                    Effect: Allow
                    Action: route53:ChangeResourceRecordSets
                    Resource: !Sub
                      - "arn:aws:route53:::hostedzone/${ZoneId}"
                      - {ZoneId: !ImportValue DNS-AccountZoneID}
            -
              PolicyName: "list-r53-resources"
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  -
                    Effect: Allow
                    Action:
                      - route53:ListHostedZones
                      - route53:ListResourceRecordSets
                    Resource: "*"
    
      NodeSecurityGroup:
        Type: AWS::EC2::SecurityGroup
        Properties:
          GroupDescription: Security group for all nodes in the cluster
          GroupName: !Sub "${ClusterName}-${Environment}-cluster-security-group"
          VpcId:
            Fn::ImportValue:
              !Sub ${VpcId}-vpcid
          Tags:
          - Key: !Sub "kubernetes.io/cluster/${ClusterName}-${Environment}-cluster"
            Value: 'owned'
    
      NodeSecurityGroupIngress:
        Type: AWS::EC2::SecurityGroupIngress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow node to communicate with each other
          GroupId: !Ref NodeSecurityGroup
          SourceSecurityGroupId: !Ref NodeSecurityGroup
          IpProtocol: '-1'
          FromPort: 0
          ToPort: 65535
    
      NodeSecurityGroupFromControlPlaneIngress:
        Type: AWS::EC2::SecurityGroupIngress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
          GroupId: !Ref NodeSecurityGroup
          SourceSecurityGroupId:
            Fn::ImportValue:
              !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
          IpProtocol: tcp
          FromPort: 1025
          ToPort: 65535
    
      ControlPlaneEgressToNodeSecurityGroup:
        Type: AWS::EC2::SecurityGroupEgress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow the cluster control plane to communicate with worker Kubelet and pods
          GroupId:
            Fn::ImportValue:
              !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
          DestinationSecurityGroupId: !Ref NodeSecurityGroup
          IpProtocol: tcp
          FromPort: 1025
          ToPort: 65535
    
      NodeSecurityGroupFromControlPlaneOn443Ingress:
        Type: AWS::EC2::SecurityGroupIngress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
          GroupId: !Ref NodeSecurityGroup
          SourceSecurityGroupId:
            Fn::ImportValue:
              !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
          IpProtocol: tcp
          FromPort: 443
          ToPort: 443
    
      ControlPlaneEgressToNodeSecurityGroupOn443:
        Type: AWS::EC2::SecurityGroupEgress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
          GroupId:
            Fn::ImportValue:
              !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
          DestinationSecurityGroupId: !Ref NodeSecurityGroup
          IpProtocol: tcp
          FromPort: 443
          ToPort: 443
    
      ClusterControlPlaneSecurityGroupIngress:
        Type: AWS::EC2::SecurityGroupIngress
        DependsOn: NodeSecurityGroup
        Properties:
          Description: Allow pods to communicate with the cluster API Server
          GroupId:
            Fn::ImportValue:
              !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
          SourceSecurityGroupId: !Ref NodeSecurityGroup
          IpProtocol: tcp
          ToPort: 443
          FromPort: 443
    
      NodeGroup:
        Type: AWS::AutoScaling::AutoScalingGroup
        Properties:
          AutoScalingGroupName: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
          DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
          LaunchConfigurationName: !Ref NodeLaunchConfig
          MinSize: !Ref NodeAutoScalingGroupMinSize
          MaxSize: !Ref NodeAutoScalingGroupMaxSize
          VPCZoneIdentifier:
            - Fn::Select:
              - 0
              - Fn::Split:
                - ","
                - Fn::ImportValue:
                    !Sub ${Subnets}
            - Fn::Select:
              - 1
              - Fn::Split:
                - ","
                - Fn::ImportValue:
                    !Sub ${Subnets}
            - Fn::Select:
              - 2
              - Fn::Split:
                - ","
                - Fn::ImportValue:
                    !Sub ${Subnets}
          Tags:
          - Key: Name
            Value: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
            PropagateAtLaunch: 'true'
          - Key: !Sub 'kubernetes.io/cluster/${ClusterName}-${Environment}-cluster'
            Value: 'owned'
            PropagateAtLaunch: 'true'
        UpdatePolicy:
          AutoScalingRollingUpdate:
            MaxBatchSize: '1'
            MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
            PauseTime: 'PT5M'
    
      NodeLaunchConfig:
        Type: AWS::AutoScaling::LaunchConfiguration
        Properties:
          LaunchConfigurationName: !Sub "${ClusterName}-${Environment}-cluster-node-launch-config"
          AssociatePublicIpAddress: 'true'
          IamInstanceProfile: !Ref NodeInstanceProfile
          ImageId: !Ref NodeImageId
          InstanceType: !Ref NodeInstanceType
          KeyName: !Ref KeyName
          SecurityGroups:
          - !Ref NodeSecurityGroup
          BlockDeviceMappings:
            - DeviceName: /dev/xvda
              Ebs:
                VolumeSize: !Ref NodeVolumeSize
                VolumeType: gp2
                DeleteOnTermination: true
          UserData:
            Fn::Base64:
              !Sub |
                #!/bin/bash
                set -o xtrace
                /etc/eks/bootstrap.sh ${BootstrapArguments} ${ClusterName}-${Environment}-cluster
                sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
                sudo start amazon-ssm-agent
                sudo sysctl -w vm.max_map_count=262144
                /opt/aws/bin/cfn-signal --exit-code $? \
                         --stack  ${AWS::StackName} \
                         --resource NodeGroup  \
                         --region ${AWS::Region}
    
    Outputs:
    
      NodeInstanceRole:
        Description: The node instance role
        Value: !GetAtt NodeInstanceRole.Arn
        Export:
          Name: !Sub "${ClusterName}-${Environment}-cluster-nodegroup-rolearn"
    
      NodeSecurityGroup:
        Description: The security group for the node group
        Value: !Ref NodeSecurityGroup
    

    amazon-eks-cluster.yaml:

    AWSTemplateFormatVersion: '2010-09-09'
    Description: 'Amazon EKS - Cluster'
    
    Parameters:
    
      VPCStack:
        Type: String
        Description: VPC Stack Name
    
      ClusterName:
        Type: String
        Description: EKS Cluster Name (i.e. "eks")
    
      Environment:
        Type: String
        Description: Environment for this Cluster (i.e. "dev") which will be appended to the ClusterName (i.e. "eks-dev")
    
    Resources:
    
      ClusterRole:
        Description: Allows EKS to manage clusters on your behalf.
        Type: AWS::IAM::Role
        Properties:
          RoleName: !Sub "${ClusterName}-${Environment}-cluster-role"
          AssumeRolePolicyDocument:
            Version: 2012-10-17
            Statement:
                Effect: Allow
                Principal:
                  Service:
                    - eks.amazonaws.com
                Action: sts:AssumeRole
          ManagedPolicyArns:
            - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
            - arn:aws:iam::aws:policy/AmazonEKSServicePolicy
          Policies:
            -
              PolicyName: "change-r53-recordsets"
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  -
                    Effect: Allow
                    Action: route53:ChangeResourceRecordSets
                    Resource: !Sub
                      - "arn:aws:route53:::hostedzone/${ZoneId}"
                      - {ZoneId: !ImportValue DNS-AccountZoneID}
            -
              PolicyName: "list-r53-resources"
              PolicyDocument:
                Version: "2012-10-17"
                Statement:
                  -
                    Effect: Allow
                    Action:
                      - route53:ListHostedZones
                      - route53:ListResourceRecordSets
                    Resource: "*"
    
      ClusterControlPlaneSecurityGroup:
        Type: AWS::EC2::SecurityGroup
        Properties:
          GroupName: !Sub "${ClusterName}-${Environment}-cluster-control-plane-sg"
          GroupDescription: Cluster communication with worker nodes
          VpcId:
            Fn::ImportValue:
              !Sub "${VPCStack}-vpcid"
    
      Cluster:
        Type: "AWS::EKS::Cluster"
        Properties:
          Version: "1.14"
          Name: !Sub "${ClusterName}-${Environment}-cluster"
          RoleArn: !GetAtt ClusterRole.Arn
          ResourcesVpcConfig:
            SecurityGroupIds:
              - !Ref ClusterControlPlaneSecurityGroup
            SubnetIds:
              - Fn::Select:
                - 0
                - Fn::Split:
                  - ","
                  - Fn::ImportValue:
                      !Sub "${VPCStack}-privatesubnets"
              - Fn::Select:
                - 1
                - Fn::Split:
                  - ","
                  - Fn::ImportValue:
                      !Sub "${VPCStack}-privatesubnets"
              - Fn::Select:
                - 2
                - Fn::Split:
                  - ","
                  - Fn::ImportValue:
                      !Sub "${VPCStack}-privatesubnets"
    
      Route53Cname:
        Type: "AWS::Route53::RecordSet"
        Properties:
          HostedZoneId: !ImportValue DNS-AccountZoneID
          Comment: CNAME for Control Plane Endpoint
          Name: !Sub
            - "k8s.${Environment}.${Zone}"
            - { Zone: !ImportValue Main-zone-name}
          Type: CNAME
          TTL: '900'
          ResourceRecords:
            - !GetAtt Cluster.Endpoint
    
    Outputs:
      ClusterName:
        Value: !Ref Cluster
        Description: Cluster Name
        Export:
          Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterName"
    
      ClusterArn:
        Value: !GetAtt Cluster.Arn
        Description: Cluster Arn
        Export:
          Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterArn"
    
      ClusterEndpoint:
        Value: !GetAtt Cluster.Endpoint
        Description: Cluster Endpoint
        Export:
          Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterEndpoint"
    
      ClusterControlPlaneSecurityGroup:
        Value: !Ref ClusterControlPlaneSecurityGroup
        Description: ClusterControlPlaneSecurityGroup
        Export:
          Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
    

    cluster-parameters.json文件

    [
      {
        "ParameterKey": "VPCStack",
        "ParameterValue": "Main"
      },
      {
        "ParameterKey": "ClusterName",
        "ParameterValue": "amundsen-eks"
      },
      {
        "ParameterKey": "Environment",
        "ParameterValue": "dev"
      }
    ]
    

    [
      {
        "ParameterKey": "KeyName",
        "ParameterValue": "data-warehouse-dev"
      },
      {
        "ParameterKey": "NodeImageId",
        "ParameterValue": "ami-08739803f18dcc019"
      },
      {
        "ParameterKey": "NodeInstanceType",
        "ParameterValue": "r5.2xlarge"
      },
      {
        "ParameterKey": "NodeAutoScalingGroupMinSize",
        "ParameterValue": "1"
      },
      {
        "ParameterKey": "NodeAutoScalingGroupMaxSize",
        "ParameterValue": "3"
      },
      {
        "ParameterKey": "NodeAutoScalingGroupDesiredCapacity",
        "ParameterValue": "2"
      },
      {
        "ParameterKey": "NodeVolumeSize",
        "ParameterValue": "20"
      },
      {
        "ParameterKey": "ClusterName",
        "ParameterValue": "amundsen-eks"
      },
      {
        "ParameterKey": "Environment",
        "ParameterValue": "dev"
      },
      {
        "ParameterKey": "BootstrapArguments",
        "ParameterValue": ""
      },
      {
        "ParameterKey": "VpcId",
        "ParameterValue": "Main"
      },
      {
        "ParameterKey": "Subnets",
        "ParameterValue": "Main-privatesubnets"
      }
    ]
    

    以及创建脚本:

    aws cloudformation create-stack \ --stack-name amundsen-eks-cluster \ --parameters file://./cluster-parameters.json \ --template-body file://../../../../templates/cloud-formation/eks/amazon-eks-cluster.yaml \ --capabilities CAPABILITY_NAMED_IAM --profile myprofile

    节点组: aws cloudformation create-stack \ --stack-name amundsen-eks-cluster-nodegroup \ --parameters file://./nodegroup-parameters.json \ --template-body file://../../../../templates/cloud-formation/eks/amazon-eks-nodegroup.yaml \ --capabilities CAPABILITY_NAMED_IAM --profile myprofile

    什么会导致这种行为\我还可以检查什么来缩小范围?

    0 回复  |  直到 5 年前