MLOps Engineering
Preamble
1
Introduction
1.1
Machine Learning Workflow
1.1.1
ML Worflow Tools
1.1.2
Developing Machine Learning Models
1.2
Machine Learning Operations (MLOps)
1.2.1
ML + (Dev)-Ops
1.2.2
MLOps Lifecycle
1.2.3
MLOps Engineering
1.3
Roles and Tasks in MLOps
1.3.1
Data Engineer
1.3.2
Data Scientist
1.3.3
ML Engineer
1.3.4
MLOps Engineer
1.3.5
DevOps Engineer
1.3.6
Additional roles & function
2
Ops Tools & Principles
2.1
Containerization
2.2
Version Control
2.2.1
Github
2.2.2
Git lifecycle
2.3
CI/CD
2.4
Infrastructure as code
2.5
Containerization
2.6
Version Control
2.6.1
Github
2.6.2
Git lifecycle
2.7
CI/CD
Github Actions
2.8
Infrastructure as code
3
Airflow
3.1
Core Components
3.1.1
DAGs
3.1.2
Operators
3.1.3
Tasks
3.1.4
XCom
3.1.5
Scheduling
3.1.6
Taskflow
3.2
Exemplary ML workflow
3.3
Airflow infrastructure
3.3.1
Airflow as a distributed system
3.3.2
Scheduler
3.3.3
Webserver
3.3.4
Executor
3.3.5
DAG Directory
3.3.6
Metadata Database
4
MLflow
4.1
Core Components
4.1.1
MLflow Tracking
4.1.2
MLflow Models
4.1.3
MLflow Model Registry
4.1.4
MLflow Projects
4.2
MLFflow Architecture
4.2.1
MLflow Tracking Server
4.2.2
MLflow Backend Store
4.2.3
MLflow Artifact Store
5
Kubernetes
5.1
Core Components
5.1.1
Nodes
5.1.2
Pods
5.1.3
Imperative & Declarative Management
5.2
Application Deployment & Design
5.2.1
Deployments
5.2.2
Resource Management
5.2.3
DaemonSets
5.2.4
StatefulSets
5.2.5
Jobs & Cron Jobs
5.3
Services and Networking
5.3.1
Services
5.3.2
Service Discovery
5.4
Volumes and Storage
5.4.1
EmptyDir Volume
5.4.2
HostPath Volume
5.4.3
Persistent Volumes
5.5
Environment, Configuration & Security
5.5.1
Namespaces
5.5.2
Labels, Selectors and Annotations
5.5.3
ConfigMaps
5.5.4
Secrets
5.6
Observability & Maintenance
5.6.1
Health Checks
5.7
Helm
5.7.1
Helm Chart Structure
5.7.2
Working with Helm
6
Terraform
6.1
Basic usage
6.1.1
terraform init
6.1.2
terraform validate
6.1.3
terraform plan
6.1.4
terraform apply
6.1.5
terraform destroy
6.2
Core Components
6.2.1
Providers
6.2.2
Resources
6.2.3
Data Sources
6.2.4
State
6.3
Modules
6.3.1
Input Variables
6.3.2
Output Variables
6.3.3
Local Variables
6.4
Additional tips & tricks
6.4.1
count
6.4.2
for-each
6.4.3
for
6.4.4
Workspaces
6.5
Exemplary Deployment
6.5.1
root
6.5.2
vpc
6.5.3
Run the code
7
ML Platform Design
Overview
Infrastructure
MLOps Tools
8
Platform Deployment
8.1
Root module
8.2
Infrastructure
8.2.1
Virtual Private Cloud
8.2.2
Elastic Kubernetes Service
8.2.3
Networking
8.2.4
Relational Database Service
8.3
Components
8.3.1
User Profiles
8.3.2
Airflow
8.3.3
Mlflow
8.3.4
Jupyterhub
8.3.5
Monitoring
8.3.6
Sagemaker
8.3.7
Dashboard
8.4
Design Decisions
9
Use Case Development
9.1
Integrated Development Environment
9.1.1
Github Repository
9.2
Training & Deployment Pipeline Workflow
9.2.1
Airflow Workflow
9.2.2
MLflow integration
9.2.3
Pipeline Workflow
9.3
Training Pipeline Steps
9.3.1
Data Preprocessing
9.3.2
Model Training
9.3.3
Data Preprocessing
9.3.4
Model Training
9.3.5
Model Comparison
9.3.6
Model Deployment & Serving
9.4
Model Inferencing
9.4.1
Pipeline Workflow
9.4.2
Inference Workflow Code
Glossary
Contributing
Acknowledgements
MLOps Engineering
8.4
Design Decisions