8.2 Infrastructure
The subdirectory infrastructure
consists of four main modules, vpc
, eks
, networking
, and rds
. The former three are responsible to create the cluster itself, as well as the necessary tools to implement the platform functionalities. The rds
module is merely an extension linked to the cluster which is needed to store data of tools like Airflow or Mlflow. The rds
module is thereby called in the corresponding modules where an AWS RDS is needed, even though the module is placed in the Infrastructure directory.
8.2.1 Virtual Private Cloud
The provided code in the vpc
module establishes a Virtual Private Cloud (VPC) with associated subnets and security groups. It configures the required networking and security infrastructure to serve as the foundation to deploy an AWS EKS cluster.
The VPC is created using the terraform-aws-modules/vpc/aws
module version 5.0.0. The VPC is assigned the IPv4 CIDR block "10.0.0.0/16"
and spans across all three available AWS availability zones within the specified region eu-central-1
. It includes both public and private subnets, with private subnets associated with NAT gateways for internet access. DNS hostnames are enabled for the instances launched within the VPC.
The VPC subnets are tagged with specific metadata relevant to Kubernetes cluster management. The public subnets are tagged with "kubernetes.io/cluster/${local.cluster_name}"
set to "shared"
and "kubernetes.io/role/elb"
set to 1
. The private subnets are tagged with "kubernetes.io/cluster/${local.cluster_name}"
set to "shared"
and "kubernetes.io/role/internal-elb"
set to 1
.
Additionally, three security groups are defined to manage access to worker nodes. They are intended to provide secure management access to the worker nodes within the EKS cluster. Two of these security groups, "worker_group_mgmt_one"
and "worker_group_mgmt_two"
, allow SSH access from specific CIDR blocks. The third security group, "all_worker_mgmt,"
allows SSH access from multiple CIDR blocks, including "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"
locals {= var.cluster_name
cluster_name
}
"aws_availability_zones" "available" {}
data
"vpc" {
module = "terraform-aws-modules/vpc/aws"
source = "5.0.0"
version
= var.vpc_name
name
= "10.0.0.0/16"
cidr = slice(data.aws_availability_zones.available.names, 0, 3)
azs
= ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
public_subnets
= true
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames
= {
public_subnet_tags "kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
= {
private_subnet_tags "kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
"aws_security_group" "worker_group_mgmt_one" {
resource = "worker_group_mgmt_one"
name_prefix = module.vpc.vpc_id
vpc_id
ingress {= 22
from_port = 22
to_port = "tcp"
protocol
= [
cidr_blocks "10.0.0.0/8",
]
}
}
"aws_security_group" "worker_group_mgmt_two" {
resource = "worker_group_mgmt_two"
name_prefix = module.vpc.vpc_id
vpc_id
ingress {= 22
from_port = 22
to_port = "tcp"
protocol
= [
cidr_blocks "192.168.0.0/16",
]
}
}
"aws_security_group" "all_worker_mgmt" {
resource = "all_worker_management"
name_prefix = module.vpc.vpc_id
vpc_id
ingress {= 22
from_port = 22
to_port = "tcp"
protocol
= [
cidr_blocks "10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
} }
8.2.2 Elastic Kubernetes Service
The provided Terraform code sets up an AWS EKS (Elastic Kubernetes Service) cluster with specific configurations and multiple node groups. The "eks"
module is used to create the EKS cluster, specifying its name and version. The cluster has public and private access endpoints enabled, and a managed AWS authentication configuration. The creation of the vpc
module is a prerequisite for the "eks"
module, as the latter requires information like the vpc_id
, or subnet_ids
for a successful creation.
The EKS cluster itself is composed of three managed node groups: "group_t3_small"
, "group_t3_medium"
, and "group_t3_large"
. Each node group uses a different instance type (t3.small
, t3.medium
, and t3.large
) and has specific scaling policies. All three node groups have auto-scaling enabled. The node group "group_t3_medium"
has set the minimum and desired sizes of nodes to 4
, which ensures a base amount of nodes and thus resources to manage further deployments. The "group_t3_large"
is tainted with a NoSchedule
. This node group can be used for more resource intensive tasks by specifiyng a pod’s toleration.
The eks
module also deploys several Kubernetes add-ons, including coredns
, kube-proxy
, aws-ebs-csi-driver
, and vpc-cni
. The vpc-cni add-on is configured with specific environment settings, enabling prefix delegation for IP addresses.
CoreDNS
provides DNS-based service discovery, allowing pods and services to communicate with each other using domain names, and thus enabling seamless communication within the cluster without the need for explicit IP addresses.kube-proxy
: is responsible for network proxying on Kubernetes nodes which ensures that network traffic is properly routed to the appropriate pods, services, and endpoints. It allows for an seamless communication between different parts of the cluster.aws-ebs-csi-driver
(Container Storage Interface) is an add-on that enables Kubernetes pods to use Amazon Elastic Block Store (EBS) volumes for persistent storage, allowing data to be retained across pod restarts and ensuring data durability for stateful applications. The EBS configuration and deployment are describen in the following subsection Elastic Block Store, but the respectiveservice_account_role_arn
is linked to the EKS cluster on creation.vpc-cni
(Container Network Interface) is essential for AWS EKS clusters, as it enables networking for pods using AWS VPC (Virtual Private Cloud) networking. It ensures that each pod gets an IP address from the VPC subnet and can communicate securely with other AWS resources within the VPC.
locals {= var.cluster_name
cluster_name = "kube-system"
cluster_namespace = "ebs-csi-controller-sa"
ebs_csi_service_account_name = "${var.cluster_name}-ebs-csi-controller"
ebs_csi_service_account_role_name = "autoscaler-controller-sa"
autoscaler_service_account_name = "${var.cluster_name}-autoscaler-controller"
autoscaler_service_account_role_name
= "t3_small"
nodegroup_t3_small_label = "t3_medium"
nodegroup_t3_medium_label = "t3_large"
nodegroup_t3_large_label = {
eks_asg_tag_list_nodegroup_t3_small_label "k8s.io/cluster-autoscaler/enabled" : true
"k8s.io/cluster-autoscaler/${local.cluster_name}" : "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" : local.nodegroup_t3_small_label
}
= {
eks_asg_tag_list_nodegroup_t3_medium_label "k8s.io/cluster-autoscaler/enabled" : true
"k8s.io/cluster-autoscaler/${local.cluster_name}" : "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" : local.nodegroup_t3_medium_label
}
= {
eks_asg_tag_list_nodegroup_t3_large_label "k8s.io/cluster-autoscaler/enabled" : true
"k8s.io/cluster-autoscaler/${local.cluster_name}" : "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" : local.nodegroup_t3_large_label
"k8s.io/cluster-autoscaler/node-template/taint/dedicated" : "${local.nodegroup_t3_large_label}:NoSchedule"
}
= {
tags = "terraform"
Owner
}
}
"aws_caller_identity" "current" {}
data
#
# EKS
#"eks" {
module = "terraform-aws-modules/eks/aws"
source = "19.5.1"
version
= local.cluster_name
cluster_name = var.eks_cluster_version
cluster_version = ["api", "controllerManager", "scheduler"]
cluster_enabled_log_types
= var.vpc_id
vpc_id = var.private_subnets
subnet_ids
= true
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
manage_aws_auth_configmap
= local.cluster_users # add users in later step
# aws_auth_users
= {
cluster_addons = {
coredns = true
most_recent ,
}-proxy = {
kube= true
most_recent ,
}-ebs-csi-driver = {
aws= "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${local.ebs_csi_service_account_role_name}"
service_account_role_arn ,
}-cni = {
vpc= true
most_recent = true
before_compute = module.vpc_cni_irsa.iam_role_arn
service_account_role_arn = jsonencode({
configuration_values = {
env https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
# Reference docs = "true"
ENABLE_PREFIX_DELEGATION = "1"
WARM_PREFIX_TARGET
}
})
}
}
= {
eks_managed_node_group_defaults = "AL2_x86_64"
ami_type = 10
disk_size = true
iam_role_attach_cni_policy = true
enable_monitoring
}
= {
eks_managed_node_groups = {
group_t3_small = "ng0_t3_small"
name
= ["t3.small"]
instance_types
= 0
min_size = 6
max_size = 0
desired_size = "ON_DEMAND"
capacity_type = {
labels = local.nodegroup_t3_small_label
role
}= {
tags "k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" = "${local.nodegroup_t3_small_label}"
}
}= {
group_t3_medium = "ng1_t3_medium"
name
= ["t3.medium"]
instance_types
= 4
min_size = 6
max_size = 4
desired_size = "ON_DEMAND"
capacity_type = {
labels = local.nodegroup_t3_medium_label
role
}= {
tags "k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" = "${local.nodegroup_t3_medium_label}"
}
}= {
group_t3_large = "ng2_t3_large"
name
= ["t3.large"]
instance_types
= 0
min_size = 3
max_size = 0
desired_size = "ON_DEMAND"
capacity_type = {
labels = local.nodegroup_t3_large_label
role
}= [
taints
{= "dedicated"
key = local.nodegroup_t3_large_label
value = "NO_SCHEDULE"
effect
}
]= {
tags "k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
"k8s.io/cluster-autoscaler/node-template/label/role" = "${local.nodegroup_t3_large_label}"
"k8s.io/cluster-autoscaler/node-template/taint/dedicated" = "${local.nodegroup_t3_large_label}:NoSchedule"
}
}
}= local.tags
tags
}
for Service Account
# Role "vpc_cni_irsa" {
module = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
source = "~> 5.0"
version
= "VPC-CNI-IRSA"
role_name_prefix = true
attach_vpc_cni_policy = true
vpc_cni_enable_ipv4
= {
oidc_providers = {
main = module.eks.oidc_provider_arn
provider_arn = ["kube-system:aws-node"]
namespace_service_accounts
}
} }
8.2.2.1 Elastic Block Store
The EBS CSI controller (Elastic Block Store Container Storage Interface) is set up by defining an IAM (Identity and Access Management) role using the "ebs_csi_controller_role"
module. The role allows the EBS CSI controller to assume a specific IAM role with OIDC (OpenID Connect) authentication, granting it the necessary permissions for EBS-related actions in the AWS environment by an IAM policy. The IAM policy associated with the role is created likewise and permits various EC2 actions, such as attaching and detaching volumes, creating and deleting snapshots, and describing instances and volumes.
The code also configures the default Kubernetes StorageClass named "gp2"
and annotates it as not the default storage class for the cluster, managing how storage volumes are provisioned and utilized in the cluster. Ensuring that the "gp2"
StorageClass does not become the default storage class is needed as we additionally create an EFS Storage (Elastic File System), which is described in the next subsection.
#
# EBS CSI controller
#"ebs_csi_controller_role" {
module = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
source = "5.11.1"
version = true
create_role = local.ebs_csi_service_account_role_name
role_name = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
provider_url = [aws_iam_policy.ebs_csi_controller_sa.arn]
role_policy_arns = ["system:serviceaccount:${local.cluster_namespace}:${local.ebs_csi_service_account_name}"]
oidc_fully_qualified_subjects
}
"aws_iam_policy" "ebs_csi_controller_sa" {
resource = local.ebs_csi_service_account_name
name = "EKS ebs-csi-controller policy for cluster ${var.cluster_name}"
description
= jsonencode({
policy "Version" : "2012-10-17",
"Statement" : [
{"Action" : [
"ec2:AttachVolume",
"ec2:CreateSnapshot",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteSnapshot",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DescribeInstances",
"ec2:DescribeSnapshots",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DetachVolume",
,
]"Effect" : "Allow",
"Resource" : "*"
}
] })
}
"kubernetes_annotations" "ebs-no-default-storageclass" {
resource = "storage.k8s.io/v1"
api_version = "StorageClass"
kind = "true"
force
metadata {= "gp2"
name
}= {
annotations "storageclass.kubernetes.io/is-default-class" = "false"
} }
8.2.2.2 Elastic File System
The EFS CSI (Elastic File System Container Storage Interface) driver permits EKS pods to use EFS as a persistent volume for data storage, enabling pods to use EFS as a scalable and shared storage solution.. The driver itself is deployed using a Helm chart through the "helm_release"
resource. Of course we also need to create an IAM role for the EFS CSI driver, which is done using the "attach_efs_csi_role"
module, which allows the driver to assume a role with OIDC authentication, and grants the necessary permissions for working with EFS, similar to the EBS setup.
For security, the code creates an AWS security group named "allow_nfs"
that allows inbound NFS traffic on port 2049 from the private subnets of the VPC. This allows the EFS mount targets to communicate with the EFS file system securely. The EFS file system and access points are created manually for each private subnet mapping the "aws_efs_mount_target"
to the "aws_efs_file_system"
resource.
Finally, the code defines a Kubernetes StorageClass named "efs"
using the "kubernetes_storage_class_v1"
resource. The StorageClass specifies the EFS CSI driver as the storage provisioner and the EFS file system created earlier as the backing storage. Additionally, the "efs"
StorageClass is marked as the default storage class for the cluster using an annotation. This allows dynamic provisioning of EFS-backed persistent volumes for Kubernetes pods on default, simplifying the process of handling storage in the EKS cluster. This is done for example for the Airflow deployment in a later step.
#
# EFS
#"helm_release" "aws_efs_csi_driver" {
resource = "aws-efs-csi-driver"
chart = "aws-efs-csi-driver"
name = "kube-system"
namespace = "https://kubernetes-sigs.github.io/aws-efs-csi-driver/"
repository
set {= "controller.serviceAccount.create"
name = true
value
}
set {= "controller.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
name = module.attach_efs_csi_role.iam_role_arn
value
}
set {= "controller.serviceAccount.name"
name = "efs-csi-controller-sa"
value
}
}
"attach_efs_csi_role" {
module = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
source = "efs-csi"
role_name = true
attach_efs_csi_policy = {
oidc_providers = {
ex = module.eks.oidc_provider_arn
provider_arn = ["kube-system:efs-csi-controller-sa"]
namespace_service_accounts
}
}
}
"aws_security_group" "allow_nfs" {
resource = "allow nfs for efs"
name = "Allow NFS inbound traffic"
description = var.vpc_id
vpc_id
ingress {= "NFS from VPC"
description = 2049
from_port = 2049
to_port = "tcp"
protocol = var.private_subnets_cidr_blocks
cidr_blocks
}
egress {= 0
from_port = 0
to_port = "-1"
protocol = ["0.0.0.0/0"]
cidr_blocks = ["::/0"]
ipv6_cidr_blocks
}
}
"aws_efs_file_system" "stw_node_efs" {
resource = "efs-for-stw-node"
creation_token
}
"aws_efs_mount_target" "stw_node_efs_mt_0" {
resource = aws_efs_file_system.stw_node_efs.id
file_system_id = var.private_subnets[0]
subnet_id = [aws_security_group.allow_nfs.id]
security_groups
}
"aws_efs_mount_target" "stw_node_efs_mt_1" {
resource = aws_efs_file_system.stw_node_efs.id
file_system_id = var.private_subnets[1]
subnet_id = [aws_security_group.allow_nfs.id]
security_groups
}
"aws_efs_mount_target" "stw_node_efs_mt_2" {
resource = aws_efs_file_system.stw_node_efs.id
file_system_id = var.private_subnets[2]
subnet_id = [aws_security_group.allow_nfs.id]
security_groups
}
"kubernetes_storage_class_v1" "efs" {
resource
metadata {= "efs"
name = {
annotations "storageclass.kubernetes.io/is-default-class" = "true"
}
}
= "efs.csi.aws.com"
storage_provisioner = {
parameters = "efs-ap" # Dynamic provisioning
provisioningMode = aws_efs_file_system.stw_node_efs.id # module.efs.id
fileSystemId = "777"
directoryPerms
}
= [
mount_options "iam"
] }
8.2.2.3 Cluster Autoscaler
The EKS Cluster Autoscaler ensures that the cluster can automatically scale its worker nodes based on the workload demands, ensuring optimal resource utilization and performance.
The necessary IAM settings are set up prior to deploying the Autoscaler. First, an IAM policy named "node_additional"
is created to grant permission to describe EC2 instances and related resources. This enables the Autoscaler to gather information about the current state of the worker nodes and make informed decisions regarding scaling. For each managed node group in the EKS cluster (defined by the "eks_managed_node_groups"
module output), the IAM policy is attached to its corresponding IAM role. This ensures that all worker nodes have the required permissions to work with the Autoscaler. After setting up the IAM policies, tags are added to provide the necessary information for the EKS Cluster Autoscaler to identify and manage the Auto Scaling Groups effectively and to support cluster autoscaling from zero for each node group. The tags are created for each node group ("nodegroup_t3_small"
, "nodegroup_t3_medium"
,and "nodegroup_t3_large"
) and are based on the specified tag lists defined in the "local.eks_asg_tag_list_*"
variables.
The EKS Cluster Autoscaler itself is instantiated using the custom "eks_autoscaler"
module on the bottom of the code snippet. The module is called to set up the Autoscaler for the EKS cluster and the required input variables are provided accordingly. Its components are described in detailed in the following.
#
# EKS Cluster autoscaler
#"aws_iam_policy" "node_additional" {
resource = "${local.cluster_name}-additional"
name = "${local.cluster_name} node additional policy"
description
= jsonencode({
policy = "2012-10-17"
Version = [
Statement
{= [
Action "ec2:Describe*",
]= "Allow"
Effect = "*"
Resource ,
}
]
})
}
"aws_iam_role_policy_attachment" "additional" {
resource = module.eks.eks_managed_node_groups
for_each
= aws_iam_policy.node_additional.arn
policy_arn = each.value.iam_role_name
role
}
for the ASG to support cluster-autoscaler scale up from 0 for nodegroup2
# Tags "aws_autoscaling_group_tag" "nodegroup_t3_small" {
resource = local.eks_asg_tag_list_nodegroup_t3_small_label
for_each = element(module.eks.eks_managed_node_groups_autoscaling_group_names, 2)
autoscaling_group_name
tag {= each.key
key = each.value
value = true
propagate_at_launch
}
}
"aws_autoscaling_group_tag" "nodegroup_t3_medium" {
resource = local.eks_asg_tag_list_nodegroup_t3_medium_label
for_each = element(module.eks.eks_managed_node_groups_autoscaling_group_names, 1)
autoscaling_group_name
tag {= each.key
key = each.value
value = true
propagate_at_launch
}
}
"aws_autoscaling_group_tag" "nodegroup_t3_large" {
resource = local.eks_asg_tag_list_nodegroup_t3_large_label
for_each = element(module.eks.eks_managed_node_groups_autoscaling_group_names, 0)
autoscaling_group_name
tag {= each.key
key = each.value
value = true
propagate_at_launch
}
}
"eks_autoscaler" {
module = "./autoscaler"
source = local.cluster_name
cluster_name = local.cluster_namespace
cluster_namespace = var.aws_region
aws_region = module.eks.cluster_oidc_issuer_url
cluster_oidc_issuer_url = local.autoscaler_service_account_name
autoscaler_service_account_name }
The configurationof the Cluster Autoscaler begins with the creation of a Helm release named "cluster-autoscaler"
using the "helm_release"
resource. The Helm chart is sourced from the "kubernetes.github.io/autoscaler"
repository with the chart version "9.10.7"
. The settings inside the Helm release include the AWS region, RBAC (Role-Based Access Control) settings for the service account, cluster auto-discovery settings, and the creation of the service account with the required permissions.
The necessary resources for the settings are created accordingly in the following. The service account is created using the "iam_assumable_role_admin"
module with an assumable IAM role that allows the service account to access the necessary resources for scaling. It is associated with the OIDC (OpenID Connect) provider for the cluster to permit access.
An IAM policy named "cluster_autoscaler"
is created to permit the Cluster Autoscaler to interact with Auto Scaling Groups, EC2 instances, launch configurations, and tags. The policy includes two statements: "clusterAutoscalerAll"
and "clusterAutoscalerOwn"
. The first statement grants read access to Auto Scaling Group-related resources, while the second statement allows the Cluster Autoscaler to modify the desired capacity of the Auto Scaling Groups and terminate instances. The policy also includes conditions to ensure that the Cluster Autoscaler can only modify resources with specific tags. The conditions check that the Auto Scaling Group has a tag "k8s.io/cluster-autoscaler/enabled"
set to "true"
and a tag "k8s.io/cluster-autoscaler/<cluster_name>"
set to "owned"
. If you remember it, we have set these tags when setting up the managed node groups for the EKS Cluster in the previous step.
"helm_release" "cluster-autoscaler" {
resource = "cluster-autoscaler"
name = var.cluster_namespace
namespace = "https://kubernetes.github.io/autoscaler"
repository = "cluster-autoscaler"
chart = "9.10.7"
version = false
create_namespace
set {= "awsRegion"
name = var.aws_region
value
}
set {= "rbac.serviceAccount.name"
name = var.autoscaler_service_account_name
value
}
set {= "rbac.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
name = module.iam_assumable_role_admin.iam_role_arn
value = "string"
type
}
set {= "autoDiscovery.clusterName"
name = var.cluster_name
value
}
set {= "autoDiscovery.enabled"
name = "true"
value
}
set {= "rbac.create"
name = "true"
value
}
}
"iam_assumable_role_admin" {
module = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
source = "~> 4.0"
version = true
create_role = "cluster-autoscaler"
role_name = replace(var.cluster_oidc_issuer_url, "https://", "")
provider_url = [aws_iam_policy.cluster_autoscaler.arn]
role_policy_arns = ["system:serviceaccount:${var.cluster_namespace}:${var.autoscaler_service_account_name}"]
oidc_fully_qualified_subjects
}
"aws_iam_policy" "cluster_autoscaler" {
resource = "cluster-autoscaler"
name_prefix = "EKS cluster-autoscaler policy for cluster ${var.cluster_name}"
description = data.aws_iam_policy_document.cluster_autoscaler.json
policy
}
"aws_iam_policy_document" "cluster_autoscaler" {
data
statement {= "clusterAutoscalerAll"
sid = "Allow"
effect
= [
actions "autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"ec2:DescribeLaunchTemplateVersions",
]
= ["*"]
resources
}
statement {= "clusterAutoscalerOwn"
sid = "Allow"
effect
= [
actions "autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup",
]
= ["*"]
resources
condition {= "StringEquals"
test = "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/${var.cluster_name}"
variable = ["owned"]
values
}
condition {= "StringEquals"
test = "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/enabled"
variable = ["true"]
values
}
} }
8.2.3 Networking
The networking
module of the infrastructure directory integrates an Application Load Balancer (ALB) and External DNS in the cluster. Both play crucial roles in managing and exposing Kubernetes applications within the EKS cluster to the outside world. The ALB serves as an Ingress Controller to route external traffic to Kubernetes services, while External DNS automates the management of DNS records, making it easier to access services using user-friendly domain names. The root module of network just calls both submodules, which are described in detail in the upcoming sections.
"external-dns" {
module ...
}
"application-load-balancer" {
module ...
}
8.2.3.1 AWS Application Load Balancer (ALB)
The ALB is a managed load balancer service provided by AWS. In the context of an EKS cluster, the ALB serves as an Ingress Controller and thus is responsible for routing external traffic to the appropriate services and pods running inside your Kubernetes cluster. The ALB acts as the entry point to our applications and enables us to expose multiple services over a single public IP address or domain name, which simplifies access for users and clients.
The code starts by defining some local variables, followed by creating an assumable IAM role for the AWS Load Balancer Controller service account by the module aws_load_balancer_controller_controller_role
. The service account holds the necessary permissions and associates with the OIDC provider of the EKS cluster as it is the same module call we already used multiple times beforehand. The IAM policy for the role is defined in the "aws_iam_policy.aws_load_balancer_controller_controller_sa"
resource.
Since its policy document is quite extensive, it is loaded from a file named "AWSLoadBalancerControllerPolicy.json.
“. In summary, the AWS IAM document allows the AWS Elastic Load Balancing (ELB) controller, specifically the Elastic Load Balancer V2 (ELBV2) API, to perform various actions related to managing load balancers, target groups, listeners, rules, and tags. The document includes several”Allow” statements that grant permissions for actions like describing and managing load balancers, target groups, listeners, and rules. It also allows the controller to create and delete load balancers, target groups, and listeners, as well as modify their attributes. Additionally, the document permits the addition and removal of tags for ELBV2 resources.
After setting up the IAM role, the code proceeds to install the AWS Load Balancer Controller using Helm. The Helm chart is sourced from the "aws.github.io/eks-charts"
repository, specifying version "v2.4.2"
. The service account configuration is provided to the Helm release’s values, including the name of the service account and annotations to associate it with the IAM role created earlier. The "eks.amazonaws.com/role-arn"
annotation points to the ARN of the IAM role associated with the service account, allowing the controller to assume that role and operate with the appropriate permissions.
locals {= "aws-load-balancer-controller-role"
aws_load_balancer_controller_service_account_role_name = "aws-load-balancer-controller-sa"
aws_load_balancer_controller_service_account_name
}
"aws_caller_identity" "current" {}
data "aws_region" "current" {} #
data
"aws_load_balancer_controller_controller_role" {
module = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
source = "5.11.1"
version = true
create_role = local.aws_load_balancer_controller_service_account_role_name
role_name = replace(var.cluster_oidc_issuer_url, "https://", "")
provider_url = [aws_iam_policy.aws_load_balancer_controller_controller_sa.arn]
role_policy_arns = ["system:serviceaccount:kube-system:${local.aws_load_balancer_controller_service_account_name}"]
oidc_fully_qualified_subjects
}
"aws_iam_policy" "aws_load_balancer_controller_controller_sa" {
resource = local.aws_load_balancer_controller_service_account_name
name = "EKS ebs-csi-controller policy for cluster ${var.cluster_name}"
description
= file("${path.module}/AWSLoadBalancerControllerPolicy.json")
policy
}
"helm_release" "aws-load-balancer-controller" {
resource = var.helm_chart_name
name = var.namespace
namespace = "aws-load-balancer-controller"
chart = false
create_namespace
= "https://aws.github.io/eks-charts"
repository = var.helm_chart_version
version
= [yamlencode({
values = var.cluster_name
clusterName = {
image = "v2.4.2"
tag ,
}= {
serviceAccount = "${local.aws_load_balancer_controller_service_account_name}"
name = {
annotations "eks.amazonaws.com/role-arn" = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${local.aws_load_balancer_controller_service_account_role_name}"
}
}
})] }
8.2.3.2 External DNS
External DNS is a Kubernetes add-on that automates the creation and management of DNS records for Kubernetes services. It is particularly useful when services are exposed to the internet through the ALB or any other Ingress Controller. When an Ingress resource is created that defines how external traffic should be routed to services within the EKS cluster, External DNS automatically updates the DNS provider with the corresponding DNS records (in our case this is Route 53 in AWS). Automatically configuring DNS records ensures that the records are always up-to-date, which helps maintain consistency and reliability in the DNS configuration, and users can access the Kubernetes services using user-friendly domain names rather than relying on IP addresses.
The code is structured similar to the ALB and defines local variables first, followed by creating a service account to interact with AWS resources. The service account, its role with OIDC and the policy with relevant permissions are created by the external_dns_controller_role
module same to as we know it from previous implementations. The policy allows the external DNS controller to operate within the specified AWS Route 53 hosted zone, such as changing resource record sets, and listing hosted zones and resource record sets.
Finally, the Helm is used to to deploy the external DNS controller as a Kubernetes resource. The Helm release configuration includes specifying the previously create service account, the IAM role-arn
associated with it, the aws.region
where the Route 53 hosted zone exists, and a domainFilter
which filters to a specific domain provided by us.
locals {= "external-dns-role"
external_dns_service_account_role_name = "external-dns-sa"
external_dns_service_account_name
}
"aws_caller_identity" "current" {}
data "aws_region" "current" {} #
data
"external_dns_controller_role" {
module = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
source = "5.11.1"
version = true
create_role = local.external_dns_service_account_role_name
role_name = replace(var.cluster_oidc_issuer_url, "https://", "")
provider_url = [aws_iam_policy.external_dns_controller_sa.arn]
role_policy_arns = ["system:serviceaccount:${var.namespace}:${local.external_dns_service_account_name}"]
oidc_fully_qualified_subjects
}
"aws_iam_policy" "external_dns_controller_sa" {
resource = local.external_dns_service_account_name
name = "EKS ebs-csi-controller policy for cluster ${var.cluster_name}"
description
= jsonencode({
policy "Version" : "2012-10-17",
"Statement" : [
{"Effect" : "Allow",
"Action" : [
"route53:ChangeResourceRecordSets"
,
]"Resource" : [
"arn:aws:route53:::hostedzone/*"
],
}
{"Effect" : "Allow",
"Action" : [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
,
]"Resource" : [
"*"
]
}
]
})
}
"helm_release" "external_dns" {
resource = var.name
name = var.namespace
namespace = var.helm_chart_name
chart = false
create_namespace
= "https://charts.bitnami.com/bitnami"
repository = var.helm_chart_version
version
= [yamlencode({
values = {
serviceAccount = true
create = "${local.external_dns_service_account_name}"
name = {
annotations "eks.amazonaws.com/role-arn" = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${local.external_dns_service_account_role_name}"
},
}= {
aws = "public"
zoneType = "${data.aws_region.current.name}"
region ,
}= "sync"
policy = [
domainFilter "${var.domain_name}"
]= "aws"
provider = "${var.name}"
txtOwnerId
})] }
8.2.4 Relational Database Service
The Amazon RDS (Relational Database Service) instance is provisioned by the aws_db_instance
resource. It configures the instance with the specified settings, such as allocated_storage
, storage_type
, engine
, db_name
, username
, and password
, etc. All these parameters are provided whenever the module is invoked, e.g. in the Airflow or Mlflow modules.. The skip_final_snapshot
set to true states that no final DB snapshot will be created when the instance is deleted.
The resource aws_db_subnet_group
creates an RDS subnet group with the name "vpc-subnet-group-${local.rds_name}"
. It associates the RDS instance with the private subnets specified in the VPC
module, and is used to define the subnets in which the RDS instance can be launched. Similar to the subnet group, the RDS instance uses an own security group. The security group aws_security_group
is attached to the RDS instance. It specifies ingress
(inbound)) and egress
(outbound) rules to control network traffic. In this case, it allows inbound access on the specified port used by the RDS engine (5432 for PostgreSQL) from the CIDR blocks specified in the private_subnets_cidr_blocks
, and allows all outbound traffic (0.0.0.0/0
) from the RDS instance.
The rds
module is not necessarily needed to run a kubernetes cluster properly. It is merely an extension of the cluster and is needed to store relevant data of the tools used, such as airflow or mlflow. The module is thus called directly from the own airflow and mlflow modules.
locals {= var.rds_name
rds_name = var.rds_engine
rds_engine = var.rds_engine_version
rds_engine_version = var.rds_port
rds_port
}
"aws_db_subnet_group" "default" {
resource = "vpc-subnet-group-${local.rds_name}"
name = var.private_subnets
subnet_ids
}
"aws_db_instance" "rds_instance" {
resource = var.max_allocated_storage
allocated_storage = var.storage_type
storage_type = local.rds_engine
engine = local.rds_engine_version
engine_version = var.rds_instance_class
instance_class = "${local.rds_name}_db"
db_name = "${local.rds_name}_admin"
username = var.rds_password
password = "${local.rds_name}-${local.rds_engine}"
identifier = local.rds_port
port = [aws_security_group.rds_sg.id]
vpc_security_group_ids = aws_db_subnet_group.default.name
db_subnet_group_name = true
skip_final_snapshot
}
"aws_security_group" "rds_sg" {
resource = "${local.rds_name}-${local.rds_engine}-sg"
name = var.vpc_id
vpc_id
ingress {= "Enable postgres access"
description = local.rds_port
from_port = local.rds_port
to_port = "tcp"
protocol = var.private_subnets_cidr_blocks
cidr_blocks
}
egress {= 0
from_port = 0
to_port = "-1"
protocol = ["0.0.0.0/0"]
cidr_blocks
} }