Chapter 8 Platform Deployment

The provided directory structure represents the Terraform project for managing the infrastructure of our ML platform. It follows a modular organization to promote reusability and maintainability of the codebase. The full codebase is also available and can be accessed on github

root
   main.tf
   variables.tf
   outputs.tf
   providers.tf

└── infrastructure

   └── vpc

   └── eks

   └── networking

   └── rds

└── modules
    
    └── airflow
    
    └── mlflow
    
    └── jupyterhub

By structuring the Terraform project this way it becomes easier to manage, scale, and maintain the infrastructure as the project grows. Each module can be independently developed, tested, and reused across different projects, promoting consistency and reducing duplication of code and effort.

Root

The root directory of the Terraform project contains the general configuration files related to the overall infrastructure setup.

  • The main.tf Terraform configuration file, where all major resources are defined and organized into modules.
  • The variables.tf containing the definition of input variables used throughout the project, allowing users to customize the infrastructure setup.
  • The outputs.tf defining the output variables that expose relevant information about the deployed infrastructure.
  • The providers.tf that defining and configuring the providers used in the project, for example, AWS, Kubernetes, Helm.

Infrastructure

The infrastructure directory holds the individual modules responsible for provisioning specific components of the AWS Cloud and EKS setup.

  • vpc defines a module that configures resources related to the Virtual Private Cloud (VPC), such as subnets, route tables, and internet gateways.
  • The eks module is responsible for creating and configuring an Amazon Elastic Kubernetes Service (EKS) cluster, including worker nodes and other related resources like the Cluster Autoscaler, Elastic Block Storage, or Elastic File System.
  • networking contains networking components that provide access to the cluster using ingresses and DNS records, for example the AWS Application Load Balancer or an External DNS.
  • The rds module provides resources to deploy and Amazon Relational Database Service (RDS), such as database instances, subnets, and security groups. This module is needed for the specific tools and components of our ML platform.

Modules

The modules directory contains Terraform modules that are specific for setting up out ML Platform and provides the components to integrate the MLOps Framework, such as tools for model tracking (MLflow), workflow management (Airflow), or a integrated development environment (JupyterHub).

  • airflow provides the Terraform module to deploy an Apache Airflow instance based on the Helm provider, which enables to orchestrate our ML workflows. The module is highly customized as it sets up necessary connections to other services, sets airflow variables that can be used by Data Scientists, creates an ingress ressource, and enables user management and authentication using Github.
  • The mlflow module sets up MLflow to managing machine learning experiments and models. As MLflow does not natively provide a solution to deploy on Kubernetes, a custom Helm deployment is integrated that configures the necessary deployment, services, and ingress ressources.
  • jupyterhub deploys a JupyterHub environment via Helm that enables multi-user notebook environment, suitable for collaborative data science and machine learning work. The Helm chart is highly customized providing user management and authentication via Github, provisioning ingress resources, and cloning a custom Github repository that provides all our Data Science and Machine Learning code.

Prerequisites & Installation

The installation and deployment process involves several key steps to ensure a smooth setup of your environment. Before proceeding with the installation, it’s crucial to complete the following essential prerequisites: installing the necessary tools, establishing a GitHub organization along with OAuth apps, and obtaining a DNS name to link with your services. Please adhere to the installation instructions of the repositories’ readme document.