Kapitan Scout - ML Deployment Simplified

Kapitan Scout – ML Deployment Simplified


Whilst embarking on a journey as AI engineers, many of us took different paths, some start off with statistics, some decided that ML fundamentals is the way to go, others jump straight into applying existing python libraries to solve real life problems. Regardless of the path taken, most of us focused on coming up with ML models – be it Exploratory Data Analysis, Feature Cleaning/Engineering, Model Training/Tuning etc., in order to come up with the “best” performing model for an existing problem. It is at this stage that AI engineers face a common problem, what do we do with a trained model?

ML Deployment

The simple answer to the above is, “let’s just deploy it”. However, there are multiple considerations that come into play when deploying a ML model. It can be very roughly broken down into 2 segments – ML deployment considerations, and others. Below are some common considerations:

  1. ML deployment considerations:
    • What technology should we use to deploy the model?
    • What features are essential to a ML deployment? (upgrade/rollback, metrics, scalability etc.)
    • How is the customer feeding input into the data? Does it require prepocessing?
    • How is the customer receiving the results? Does it require post processing?
    • Does the model require constant prediction VS one time prediction?
  2. Other considerations:
    • Whether the current AI engineers have the skills to deploy the model?
    • Security and safety issues

Based on the above, there’s no doubt that it ain’t an easy feat when it comes to deploying an ML model. In order to reduce the hassle faced when coming up with a deployment strategy and to streamline the entire deployment process, we came up with Kapitan Scout (“KS”) and Kapitan Scout Platform (“KSP”).

Versatile ML Template – Kapitan Scout

With the above matters being said, what we need now is a simple, hassle free way for AI engineers to easily deploy trained ML models. Our approach is KS:

  • ML template completed by AI engineers
  • ML template wrapped into REST API image automatically
  • REST API image deployed onto KSP automatically

The above diagram shows the rough idea of how KS works. As you can see, the last two steps are automated by CI/CD, and only the first is manual. This means that AI Engineers only have to complete the ML template in order to use KS to deploy their model.

If you are still interested, let’s get a deeper dive into what makes up the above 3 components.

1) ML Template

The template provides a general code structure and guide for AI engineers to complete it. Within the template, AI engineers can customize it to suit their project requirements, making it flexible to different user requirements. For example, they can customize different:

  • methods of loading their model (e.g. from S3 bucket, locally etc.)
  • machine learning framework (e.g. Sklearn, Tensorflow, etc.)
  • prediction logic flow
  • metrics (e.g. cm for classification problem, accuracy for regression etc.)
  • preprocessing data input and postprocess data output
  • sending feedback to update model performance (for multi-armed bandit)
  • deployment options (e.g. single model, abtesting or multi-armed bandit)
GitLab repository for the template. Users can make modifications to the Model.py script to tailor it to their models.
Snippet of model.py template code. Users are to follow the docstring instructions of model function in model.py script and then edit it accordingly.

2) Package as Docker Image

KS makes use of Seldon-Core and Docker to wrap the ML template as a REST API image. Deploying ML template as REST API increases the versatility of the deployment, the AI engineers can easily send a request to the REST API and get back results from most mediums (e.g. add it easily to an existing React.js front-end project). In addition, wrapping the ML template as an image makes it easier to deploy, without having to worry about the operating system.

3) Deploy to KSP

KS makes use of Helm to deploy onto KSP (KSP is installed on a Kubernetes cluster). This allows the ML deployment to easily be upgraded and rollbacked, in addition to the features that KSP provides (to be covered below).

All-inclusive ML Platform – Kapitan Scout Platform

We mentioned that KS is deployed to KSP above, you may be wondering all this while, what does KSP actually do? In simple terms, KSP is deployed on a Kubernetes Cluster (k3s or K8s) and:

  • manages KS deployments
  • scrape metrics from KS deployments
  • display the metrics on a dashboard

The above diagram shows the rough idea of how KSP works. If you are still interested, let’s get a deeper dive into what makes up the above 3 components.

1) Manages KS Deployments

As mentioned in the previous section, ML template is wrapped into REST API with the help of seldon-core. Within the KSP, we deployed a seldon-core operator which manages KS deployments (aka seldon-deployments). With the help of seldon-core operator, we are able to not only have basic functions such as upgrade, rollback, auto-scaling, self-healing, it also helps to implement the complex deployment options such as ab-testing, multi-armed bandit, routing feedback etc. defined in KS template.

2) Scrape Metrics

When it comes to ML deployments, metrics will always be included in the picture. For KSP, we incorporated Prometheus to scrape metrics exposed by KS deployments. We implemented two default metrics: “model_latency_seconds” and “model_accuracy_rate”, but users are free to implement their own metrics depending on their model requirements such as recall, precision for classification models and r-squared for regression models.

3) Display Metrics on Dashboard

After metrics are scraped by prometheus, KSP uses Grafana to query the metrics collected, and display them as a dashboard. Below is an example of how our default dashboard looks like.

Further Improvements

In the near future we would like to further improve on KS user friendliness and to add more features to it.

To make KS more user friendly, we hope to create a GUI for Kapitan Scout. Currently, users have to enter bash commands, locate files to modify, and even to upload the saved model manually. We thought of putting these steps into a workflow which can be completed using a GUI interface. This makes the modifying the template more intuitive, making it more user friendly to users.

Screenshot from Seldon Deploy

One feature that our users have suggested is the ability to detect dataset drifts. The live data can and will evolve to one that is quite different from the original training dataset after some time. This will result in lower quality predictions unless the AI engineers update their models regularly. Therefore, the ability for KS to detect dataset drifts automatically and notify users or even to automatically retrain the model will be very useful in helping the deployments stay relevant.


What we have shared above is just a brief overview of the problem Kapitan Scout and Kapitan Scout Platform is meant to solve, features it currently have, how it is meant to be used, and future improvements. The project is not ready for open-sourced yet but we hope to do so in the future.

Zhong Hao: A physicist turned ML engineer, Zhong Hao is the guy who integrated the various deployment types into the project. He enjoys experimenting with fresh ideas and projects, due to his passion for learning.