9.1 Integrated Development Environment
Jupyterhub serves as the integrated server environment within the ML platform, providing an Integrated Development Environment (IDE). However, it deviates from the traditional Jupyter Notebooks and instead utilizes VSCode as the IDE. Upon initialization, Jupyterhub clones the GitHub repository mlops-airflow-DAGs that contains the code for the use case. This is the same repository that is synchronized with Airflow to load the provided DAGs. The purpose of this approach is to offer a user-friendly and efficient development experience to platform users.
In addition to synchronizing with Airflow, the use of GitHub provides additional tools to streamline development and incorporate DevOps practices effectively. One of these tools is Github Actions, which enables automation through CI/CD (Continuous Integration/Continuous Deployment) and supports the Git workflow. By configuring Github Actions through code, developers can seamlessly integrate these practices into their development processes. The configuration files for Github Actions are also cloned alongside the repository, ensuring that the integration and development processes remain smooth and efficient.
9.1.1 Github Repository
The code for the model pipeline is located in the GitHub repository called mlops-airflow-DAGs. It encompasses the setup of an Airflow DAG and the utilization of MLflow. The code responsible for the pipeline functionality can be located in the src subdirectory of the repository. The Airflow DAG is written in the airflow_DAG.py file situated in the root directory of the repository. Additionally, the repository includes a GitHub Actions workflow file located in the .github/workflows/ subdirectory. The Docker subdirectory contains the Dockerfiles for the various containers used in the ML pipeline. These include a container with the code for data preprocessing and model training, a Dockerfile for serving the model using fastAPI, and a file that runs a streamlit app for sending inferences to the served model.
root
│
│   Readme.md
└── .github/workflows/
│
└── cnn_skin_cancer
│   │   airflow_docker_DAG.py
│   │   airflow_k8s_test_inference_DAG.py
│   │   airflow_k8s_workflow_DAG.py
│   │
│   └── Docker
│   │   └── python-base-cnn-model
│   │       └── Dockerfile
│   │           ...
│   │
│   └── inference_test_images
│   │   └── 1.jpg
│   │   │   10.jpg
│   │       ...
│   │
│   └── src
│       └── preprocessing.py
│       │   train.py
│       │   ...
│       └── model
│           └── utils.py
│               ...
│