5.2 Application Deployment & Design
5.2.1 Deployments
We should never deploy a pod using kind:Pod
. Pods are ephemeral, so never treat them like pets. They do not heal on their own and if a pod is terminated, it does not restart by themselves. This is dangerous as there should always be one replica running of and application. This demands for a mechanism for the application to self heal and this is exactly where Deployments and ReplicaSets come in to solve the problem.
In general, Pods should be managed through Deployments. The purpose of a Deployment is to facilitate software deployment. They manage releases of a new application, they provide zero downtime of an application and create a ReplicaSet behind the scenes. K8s will take care of the full deployment process when applying a Deployment, even if we want to make a rolling update to change the version.
5.2.1.1 ReplicaSets
A ReplicaSet makes sure that a desired number of pods is running. When looking at Pods’ name of a Deployment, it usually has a random string attached. This is because a deployment can have multiple replicas and the random suffix ensures a different name after all. The way ReplicaSets work is that they implement a background control loop that checks the desired number of pods are always present on the cluster. We can specify the number of replicas by creating a yaml-file of a Deployment, similar to previous specifications done to a Pod. As a reminder, the Deployment can be applied using the kubectl apply -f
as well.
# deployment.yaml
apiVersion: apps/v1
# specify that we want a deployment
kind: Deployment
metadata:
name: hello-world
spec:
# specify number of replicas
replicas: 3
selector:
matchLabels:
app: hello-world
template:
metadata:
labels:
app: hello-world
spec:
containers:
- name: hello-world
image: seblum/mlops-public:cat-v1
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 5000
5.2.1.2 Rolling updates
A rolling update means that a new version of the application is rolled out. In general, a basic deployments strategy will delete every single pod before it creates a new version. This is very dangerous since there is downtime. The preferred strategy is to perform a rolling update. This ensures keeping traffic to the previous version until the new one is up and running and alternates traffic until the new version is fully healthy. K8s perfoms the update of an application while the application is up and running. For example, when there are two replicasets running, one with version v1 and one with v2, K8s performs the update such that it only scales v1 down when v2 is already up and running and the traffic has been redirected to v2 as well. How do the deployments need to be configured for that?
# deployment_rolling-update.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-world
spec:
replicas: 3
# a few new things have been added here
revisionHistoryLimit: 20
# specify the deployments strategy
strategy:
type: RollingUpdate
rollingUpdate:
# only one pod at a time to become unavailable
# in our case scaling down of v1
maxUnavailable: 1
# never have more than one pod above the mentioned replicas
# with three replicas, there will never be 5 pods running during a rollout
maxSurge: 1
selector:
matchLabels:
app: hello-world
template:
metadata:
labels:
app: hello-world
annotations:
# just an annotation the get the version change
kubernetes.io/change-cause: "seblum/mlops-public:cat-v2"
spec:
containers:
- name: hello-world
# only change specification of the image to v2, k8s performs the update itself
image: seblum/mlops-public:cat-v2
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 5000
The changes can be applied as well using kubectl apply -f "file-name.yaml"
. Good to know, K8s is not deleting the replicasets of previous versions. They are still stored on the Cluster Store. The spec.revisionHistory: <?>
state in the yaml denoted this. The last ten previous versions are stored on default. However, it doesn’t really make sense to keep more such for example in the previous yaml where there are the last 20 versions specified. This enables to perform Rollbacks to previous versions. To not have discrepancies in a cluster, one should always update using the declarative approach. Below stated are a number of commands that trigger and help with a rollback or with rollouts in general.
# check the status of the current deployment process
kubectl rollout status deployments <name>
# pause the rollout of the deployment.
kubectl rollout pause
# check the rollout history of a specific deployment
kubectl rollout history deployment <name>
# undo the rollout of a deployment and switch to previous version
kubectl rollout undo deployment <name>
# goes back to a specific revision
# there is a limit of history and k8s only keeps 10 previous versions
kubectl rollout undo deployment <name> --to-revision=
5.2.2 Resource Management
Besides the importance of a healthy application itself, there should be also enough resources allocated so the application can perform well, e.g. memory & CPU. Yet, it should also only consume the resources needed and not block unneeded ones. It might be dangerous, as one application using a lot of ressources, leaving nothing left for other applications and eventually starving them. To prevent this from happening in K8s. there can be a minimum amount of resources defined a container needs (request) as well as the maximum amount of resources a container can have (limit). Configuring limits and requests for a container can be done within the spec for a Pod or Deployment. Actually, we have been using them all the time previously.
# resource-management.yaml
apiVersion: apps/v1
# specify that we want a deployment
kind: Deployment
metadata:
name: rm-deployment
spec:
# specify number of replicas
replicas: 3
selector:
matchLabels:
app: rm-deployment
template:
metadata:
labels:
app: rm-deployment
spec:
containers:
- name: rm-deployment
image: seblum/mlops-public:cat-v1
requests:
memory: "512Mi"
cpu: "1000m"
# Limits
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 5000
5.2.3 DaemonSets
The master node of K8s decides on what worker nodes a pod is scheduled or not. However, there are times where we want to have a copy of a pod across the cluster. A DaemonSet ensures a copy of the specified Pod is exactly doing this. This can be useful for example to deploy system daemons such as log collectors and monitoring agents. DaemonSets are automatically deployed on every single node, unless specified on which node to run. They therefore do not need a specification of nodes and can scale up and down with the cluster as needed. They will automatically scheduled a pod on each new node.
The given example deploys a DaemonSet to cover logging using K8s FluendID.
# daemonsets.yaml
apiVersion: v1
kind: Namespace
metadata:
name: logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: logging
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
# this toleration is to have the daemonset runnable on master nodes
# remove it if your masters can't run pods
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# specify the containers as done in Pods or Deployments
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
containers:
- name: fluentd-elasticsearch
# allows to collect logs from nodes
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
5.2.4 StatefulSets
StatefulSets are used to deploy and manage stateful applications. Stateful applications are applications which are long lived, for example databases. Most applications of K8s are stateless as they only run for a specific task, like Deployments or Pods. However, a database is a state of truth and should be present at all time. StatefulSets manage the pods based on the same container specifications such as Deployments. However, unlike a Deployment, a StatefulSet ensures that each of its Pods retains a distinct identity, maintaining a persistent identifier through potential rescheduling.
Lets assume we have a StatefulSet with 3 replicas that is exposed by a headless service named ngnix-service
. Each Pod has a PV attached.
apiVersion: v1
kind: Service
metadata:
name: nginx-service
labels:
app: nginx
spec:
ports:
- port: 80
name: web-statefulset
clusterIP: None
selector:
app: nginx-service
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-statefulset
spec:
serviceName: "nginx-service"
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web-statefulset
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
# storageClassName: storage-class-name
resources:
requests:
storage: 1Gi
5.2.5 Jobs & Cron Jobs
Using the busybox image in the section about volumes we will experience that the image is very short lived. K8s is not aware of this and runs into a CrashLoopBackOff-Error. K8s will try and restart the container itself though until it BackOffs completley. Because the image is so short live, a job within the image has to be executed such as done with a shell command previously. However, what if we have a simple task that only should run like every 5 minutes, or every single day? A good idea is to use CronJobs for such tasks that start the image if needed. When comparingJobs jobs and CronJobs, jobs execute only once, whereas CronJobs execute depending on an specified expression.
The following job simulates a backup to a database that runs 30 seconds in total. The part in the args
specifies that the container will sleep for 20 seconds (the hypothetical backup). Afterward, the container will wait 10 seconds to shut down, as specified in ttlSecondsAfterFinished
.
# job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: db-backup-job
spec:
# time it takes to terminate the job for one completion
ttlSecondsAfterFinished: 10
template:
spec:
containers:
- name: backup
image: busybox
command: ["/bin/sh", "-c"]
args:
- "echo 'performing db backup...' && sleep 20"
restartPolicy: Never
The CronJob below runs run every minute. Given the structure of ( * * * * * * ) - ( Minutes Hours Day-of-month Month Day-of-week Year), the cronjob expression defines as follows:
# cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup-cron-job
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: busybox
command: ["/bin/sh", "-c"]
args:
- "echo 'performing db backup...' && sleep 20"
restartPolicy: Never