airflow helm chart
在k8s上运行airflow。但是,web pod似乎无法连接到postgresql。奇怪的是,其他豆荚可以。
我拼凑了一些小脚本来检查,这就是我发现的:
[root@ip-10-56-173-248 bin]# cat checkpostgres.sh
docker exec -u root $1 /bin/nc -zvw2 airflow-postgresql 5432
[root@ip-10-56-173-248 bin]# docker ps --format '{{.Names}}\t{{.ID}}'|grep k8s_airflow|grep default|awk '{printf("%s ",$1); system("checkpostgres.sh " $2)}'
k8s_airflow-web_airflow-web-57c6dcd77b-dvjmv_default_67d74586-284b-11ea-8021-0249931777ef_74 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) : Connection timed out
k8s_airflow-worker_airflow-worker-0_default_67e1703a-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-scheduler_airflow-scheduler-5d9b688ccf-zdjdl_default_67d3fab4-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-postgresql_airflow-postgresql-76c954bb7f-gwq68_default_67d1cf3d-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
k8s_airflow-redis_airflow-redis-master-0_default_67d9aa36-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (?) open
k8s_airflow-flower_airflow-flower-79c999764d-d4q58_default_67d267e2-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [172.20.166.209] 5432 (postgresql) open
这是我的k8s版本信息:
â ~ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
当我对pod名称进行nslookup时,它似乎工作正常:
# nslookup airflow-postgresql
Server: 172.20.0.10
Address: 172.20.0.10#53
Non-authoritative answer:
Name: airflow-postgresql.default.svc.cluster.local
Address: 172.20.166.209
编辑:根据要求,以下是EKS设置:
亚马逊-eks-nodegroup.yaml:
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS - Node Group'
Parameters:
KeyName:
Description: The EC2 Key Pair to allow SSH access to the instances
Type: AWS::EC2::KeyPair::KeyName
NodeImageId:
Type: AWS::EC2::Image::Id
Description: AMI id for the node instances.
NodeInstanceType:
Description: EC2 instance type for the node instances
Type: String
Default: t3.medium
AllowedValues:
- t2.small
- t2.medium
- t2.large
- t2.xlarge
- t2.2xlarge
- t3.nano
- t3.micro
- t3.small
- t3.medium
- t3.large
- t3.xlarge
- t3.2xlarge
- m3.medium
- m3.large
- m3.xlarge
- m3.2xlarge
- m4.large
- m4.xlarge
- m4.2xlarge
- m4.4xlarge
- m4.10xlarge
- m5.large
- m5.xlarge
- m5.2xlarge
- m5.4xlarge
- m5.12xlarge
- m5.24xlarge
- c4.large
- c4.xlarge
- c4.2xlarge
- c4.4xlarge
- c4.8xlarge
- c5.large
- c5.xlarge
- c5.2xlarge
- c5.4xlarge
- c5.9xlarge
- c5.18xlarge
- i3.large
- i3.xlarge
- i3.2xlarge
- i3.4xlarge
- i3.8xlarge
- i3.16xlarge
- r3.xlarge
- r3.2xlarge
- r3.4xlarge
- r3.8xlarge
- r4.large
- r4.xlarge
- r4.2xlarge
- r4.4xlarge
- r4.8xlarge
- r4.16xlarge
- x1.16xlarge
- x1.32xlarge
- p2.xlarge
- p2.8xlarge
- p2.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
- r5.large
- r5.xlarge
- r5.2xlarge
- r5.4xlarge
- r5.12xlarge
- r5.24xlarge
- r5d.large
- r5d.xlarge
- r5d.2xlarge
- r5d.4xlarge
- r5d.12xlarge
- r5d.24xlarge
- z1d.large
- z1d.xlarge
- z1d.2xlarge
- z1d.3xlarge
- z1d.6xlarge
- z1d.12xlarge
ConstraintDescription: Must be a valid EC2 instance type
NodeAutoScalingGroupMinSize:
Type: Number
Description: Minimum size of Node Group ASG.
Default: 1
NodeAutoScalingGroupMaxSize:
Type: Number
Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
Default: 4
NodeAutoScalingGroupDesiredCapacity:
Type: Number
Description: Desired capacity of Node Group ASG.
Default: 3
NodeVolumeSize:
Type: Number
Description: Node volume size
Default: 20
ClusterName:
Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster. i.e. "eks"
Type: String
Environment:
Description: the Environment value provided when the cluster was created. i.e. "dev"
Type: String
BootstrapArguments:
Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
Default: ""
Type: String
VpcId:
Description: The VPC of the worker instances stack reference
Type: String
Subnets:
Description: The subnets where workers can be created.
Type: String
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
-
Label:
default: "EKS Cluster"
Parameters:
- ClusterName
-
Label:
default: "dev"
Parameters:
- Environment
-
Label:
default: "Worker Node Configuration"
Parameters:
- NodeAutoScalingGroupMinSize
- NodeAutoScalingGroupDesiredCapacity
- NodeAutoScalingGroupMaxSize
- NodeInstanceType
- NodeImageId
- NodeVolumeSize
- KeyName
- BootstrapArguments
-
Label:
default: "Worker Network Configuration"
Parameters:
- VpcId
- Subnets
Resources:
NodeInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
InstanceProfileName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-profile"
Path: "/"
Roles:
- !Ref NodeInstanceRole
NodeInstanceRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${ClusterName}-${Environment}-cluster-node-instance-role"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
- arn:aws:iam::aws:policy/AmazonS3FullAccess
- arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
Policies:
-
PolicyName: "change-r53-recordsets"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action: route53:ChangeResourceRecordSets
Resource: !Sub
- "arn:aws:route53:::hostedzone/${ZoneId}"
- {ZoneId: !ImportValue DNS-AccountZoneID}
-
PolicyName: "list-r53-resources"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action:
- route53:ListHostedZones
- route53:ListResourceRecordSets
Resource: "*"
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for all nodes in the cluster
GroupName: !Sub "${ClusterName}-${Environment}-cluster-security-group"
VpcId:
Fn::ImportValue:
!Sub ${VpcId}-vpcid
Tags:
- Key: !Sub "kubernetes.io/cluster/${ClusterName}-${Environment}-cluster"
Value: 'owned'
NodeSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: '-1'
FromPort: 0
ToPort: 65535
NodeSecurityGroupFromControlPlaneIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroup:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet and pods
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
IpProtocol: tcp
FromPort: 443
ToPort: 443
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ClusterControlPlaneSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
GroupId:
Fn::ImportValue:
!Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
ToPort: 443
FromPort: 443
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
LaunchConfigurationName: !Ref NodeLaunchConfig
MinSize: !Ref NodeAutoScalingGroupMinSize
MaxSize: !Ref NodeAutoScalingGroupMaxSize
VPCZoneIdentifier:
- Fn::Select:
- 0
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
- Fn::Select:
- 1
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
- Fn::Select:
- 2
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub ${Subnets}
Tags:
- Key: Name
Value: !Sub "${ClusterName}-${Environment}-cluster-nodegroup"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}-${Environment}-cluster'
Value: 'owned'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: '1'
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
PauseTime: 'PT5M'
NodeLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
LaunchConfigurationName: !Sub "${ClusterName}-${Environment}-cluster-node-launch-config"
AssociatePublicIpAddress: 'true'
IamInstanceProfile: !Ref NodeInstanceProfile
ImageId: !Ref NodeImageId
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroups:
- !Ref NodeSecurityGroup
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
VolumeSize: !Ref NodeVolumeSize
VolumeType: gp2
DeleteOnTermination: true
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${BootstrapArguments} ${ClusterName}-${Environment}-cluster
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo start amazon-ssm-agent
sudo sysctl -w vm.max_map_count=262144
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt NodeInstanceRole.Arn
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-nodegroup-rolearn"
NodeSecurityGroup:
Description: The security group for the node group
Value: !Ref NodeSecurityGroup
amazon-eks-cluster.yaml:
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS - Cluster'
Parameters:
VPCStack:
Type: String
Description: VPC Stack Name
ClusterName:
Type: String
Description: EKS Cluster Name (i.e. "eks")
Environment:
Type: String
Description: Environment for this Cluster (i.e. "dev") which will be appended to the ClusterName (i.e. "eks-dev")
Resources:
ClusterRole:
Description: Allows EKS to manage clusters on your behalf.
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${ClusterName}-${Environment}-cluster-role"
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
Effect: Allow
Principal:
Service:
- eks.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
Policies:
-
PolicyName: "change-r53-recordsets"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action: route53:ChangeResourceRecordSets
Resource: !Sub
- "arn:aws:route53:::hostedzone/${ZoneId}"
- {ZoneId: !ImportValue DNS-AccountZoneID}
-
PolicyName: "list-r53-resources"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: Allow
Action:
- route53:ListHostedZones
- route53:ListResourceRecordSets
Resource: "*"
ClusterControlPlaneSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "${ClusterName}-${Environment}-cluster-control-plane-sg"
GroupDescription: Cluster communication with worker nodes
VpcId:
Fn::ImportValue:
!Sub "${VPCStack}-vpcid"
Cluster:
Type: "AWS::EKS::Cluster"
Properties:
Version: "1.14"
Name: !Sub "${ClusterName}-${Environment}-cluster"
RoleArn: !GetAtt ClusterRole.Arn
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref ClusterControlPlaneSecurityGroup
SubnetIds:
- Fn::Select:
- 0
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
- Fn::Select:
- 1
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
- Fn::Select:
- 2
- Fn::Split:
- ","
- Fn::ImportValue:
!Sub "${VPCStack}-privatesubnets"
Route53Cname:
Type: "AWS::Route53::RecordSet"
Properties:
HostedZoneId: !ImportValue DNS-AccountZoneID
Comment: CNAME for Control Plane Endpoint
Name: !Sub
- "k8s.${Environment}.${Zone}"
- { Zone: !ImportValue Main-zone-name}
Type: CNAME
TTL: '900'
ResourceRecords:
- !GetAtt Cluster.Endpoint
Outputs:
ClusterName:
Value: !Ref Cluster
Description: Cluster Name
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterName"
ClusterArn:
Value: !GetAtt Cluster.Arn
Description: Cluster Arn
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterArn"
ClusterEndpoint:
Value: !GetAtt Cluster.Endpoint
Description: Cluster Endpoint
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterEndpoint"
ClusterControlPlaneSecurityGroup:
Value: !Ref ClusterControlPlaneSecurityGroup
Description: ClusterControlPlaneSecurityGroup
Export:
Name: !Sub "${ClusterName}-${Environment}-cluster-ClusterControlPlaneSecurityGroup"
cluster-parameters.json文件
[
{
"ParameterKey": "VPCStack",
"ParameterValue": "Main"
},
{
"ParameterKey": "ClusterName",
"ParameterValue": "amundsen-eks"
},
{
"ParameterKey": "Environment",
"ParameterValue": "dev"
}
]
[
{
"ParameterKey": "KeyName",
"ParameterValue": "data-warehouse-dev"
},
{
"ParameterKey": "NodeImageId",
"ParameterValue": "ami-08739803f18dcc019"
},
{
"ParameterKey": "NodeInstanceType",
"ParameterValue": "r5.2xlarge"
},
{
"ParameterKey": "NodeAutoScalingGroupMinSize",
"ParameterValue": "1"
},
{
"ParameterKey": "NodeAutoScalingGroupMaxSize",
"ParameterValue": "3"
},
{
"ParameterKey": "NodeAutoScalingGroupDesiredCapacity",
"ParameterValue": "2"
},
{
"ParameterKey": "NodeVolumeSize",
"ParameterValue": "20"
},
{
"ParameterKey": "ClusterName",
"ParameterValue": "amundsen-eks"
},
{
"ParameterKey": "Environment",
"ParameterValue": "dev"
},
{
"ParameterKey": "BootstrapArguments",
"ParameterValue": ""
},
{
"ParameterKey": "VpcId",
"ParameterValue": "Main"
},
{
"ParameterKey": "Subnets",
"ParameterValue": "Main-privatesubnets"
}
]
以及创建脚本:
aws cloudformation create-stack \
--stack-name amundsen-eks-cluster \
--parameters file://./cluster-parameters.json \
--template-body file://../../../../templates/cloud-formation/eks/amazon-eks-cluster.yaml \
--capabilities CAPABILITY_NAMED_IAM --profile myprofile
节点组:
aws cloudformation create-stack \
--stack-name amundsen-eks-cluster-nodegroup \
--parameters file://./nodegroup-parameters.json \
--template-body file://../../../../templates/cloud-formation/eks/amazon-eks-nodegroup.yaml \
--capabilities CAPABILITY_NAMED_IAM --profile myprofile
什么会导致这种行为\我还可以检查什么来缩小范围?