Introducing MLOps — Why we need it, and how to apply it in your company (3/3)

Sho
6 min readDec 23, 2023

--

Photo by Viktor Hanacek from PicJumbo

This is part III and the last part of an archive of my tech talk `Introducing MLOps — Why we need it, and how to apply it in your company` at Code Chrysalis in September 2021.

In the previous episode, we presented why we need MLOps using a simplified short story, with a real-world case study at the end. Now that we’ve learned the why, we now jump into how we can apply it on a simple project that starts with a Jupyter Notebook and ends with a full-blown high-level architecture.

Table of Contents

  1. ML Productionisation and MLOps
  2. Why do we need MLOps?
  3. How do you apply MLOps? + a simple Flow (this article)

How??

MLOps from scratch can be quite overwhelming, especially as you stare at your 500-cell single Jupyter Notebook not knowing which cell to automate first. A good set of items to start with can be:

  1. Look at the big picture first
  2. Find which tools are apt
  3. Plan out your MLOps journey

Look at the big picture first

As you take a step back and breathe, you can start drawing up how your whole process is on a simple diagram. This way you can easily see bottlenecks and low-hanging fruits that you can automate first in your journey. You can try the following for starters:

  • Map out your current architecture or workflow
  • Assess your current organisation’s MLOps Level (good reference from GCP)
  • Check your current and planned team structure
Assuming you have multiple team members, you can assign areas based on their strengths!

The diagram above applies to a multi-member team structure, but as you list the things your notebook cells (or whichever state your current architecture is in), you can also assign people or priority levels to them so it’s easy to follow your own recipe.

Find out which tools are apt

As there are a plethora of MLOps tools out in the wild, each more awesome than the last, it can be paralysing to decide which ones to use.

Many an ML engineer has been plagued by decision paralysis

A good rule of thumb here is to look back at your diagram in the “Look at the big picture first” step, see which has the top priority, and find a tool that matches two things:

  • your familiarity with it/language it supports
  • how apt that tool is for that specific component (is it model deployment? is it model versioning? is it data processing?)

As you navigate through with these two things, you can easily filter out the noise. To be frank, you can even not use any of these tools at first and simply separate your notebook’s cells into their own clean `.py` files to start with!

And definitely don’t be shy to ask around. There are a lot of resources and communities such as MLOps Community and MLOps Subreddit that you can browse around. Who knows, there may be a post that already solves your specific problem!

Plan out your MLOps journey

Now that you’ve roughly cleared out what to do and what tools to use, it’s time to plan out your journey! One good mindset here is to move with baby steps. This way you can avoid overwhelming yourself with huge migration tasks and whatnot, and you can easily review your steps as you go along for improvements.

Now without further ado, we can move on to an actual example in A Simple Flow.

A Simple Flow

We divide our path into three stages — starting with a single Jupyter Notebook to a simple API served on a host site to a complete architecture for the ML, DEV, and PROD phase.

Most projects (if not all) start with a simple Jupyter Notebook, or even a python script. This keeps the experiment feedback loop quick and it’s easy to run components of the model building in one go. However, as a model needs a wrapper, it starts requiring its own server which can be in the form of an API or even an edge solution where the model is directly used in a mobile app for example.

However, similar to how Maurice and Jim struggled with scaling the application, we will need to start building the infrastructure for the experiments, implementation, deployments, and monitoring in order to serve a larger user base.

Implementing MLOps consists of incremental improvements, jumping around the phases

Usually MLOps isn’t a sequential run of improvements through the phases. It involves jumping back and forth phases, adding minor to major upgrades and integrating them back into the whole process. The cool thing with this is if you have multiple people in your team, you can even do them in parallel!

To easily visualize the fast-moving GIF above, the sample steps starting from a single Jupyter notebook are as follows:

Experimental Stage

  1. Single Jupyter notebook (ML Phase)

MVP Stage

  1. Jupyter notebook (ML Phase)
  2. Flask API (DEV Phase)
  3. Hosting site serving (PROD Phase)

Productionization Stage + Steps

  1. Unit + Integration Tests (DEV Phase)
  2. Shareable notebooks on a cloud platform, ex. — GCP Vertex AI or AWS Sagemaker Notebooks (ML Phase)
  3. Docker-containerised Frontend & Backend (DEV Phase)
  4. Caching, ex. — Redis (DEV Phase)
  5. Load Tests, ex. — Locust (DEV Phase)
  6. Auto-scaling deployments, ex. — Kubernetes (k8s) (PROD Phase)
  7. GitOps for infrastructure, ex. — Helm + Terraform on k8s (PROD Phase)
  8. Model versioning, ex. — GCP Vertex AI Model Serving or KubeFlow (ML Phase)
  9. Dataset versioning, ex. — DVC (ML Phase)
  10. Monitoring, ex. — Grafana (PROD Phase)
  11. Alerts, ex. — Sentry / Slack (PROD Phase)

..and the list goes on as you automate your operations and improve the efficiency of each phase.

After looking at what you’ve incrementally built over your sprints and quarters, it can be breathtaking!

And as with any project, don’t forget to celebrate the small and big wins! Each improvement adds compound interest to your optimized architecture which ultimately prevents bugs, helps data-driven analyses, and simply makes development a fun process with less toil.

Takeaways

This small workflow is only a mere example. The cool (yet daunting) thing about this is there is no one way to do it! Feel free to use whatever stack your team is used to. You can use a DAG runner like AirFlow or even GitHub Actions for the whole thing if that floats your boat!

Though it is a difficult endeavour, it can be very fun and fulfilling, especially as you feel the improvements affecting your development time and effort positively. That manual process you keep repeating or copy-pasting every time that you automated with a simple script? Imagine those small bothersome things just gone thanks to none other than yourself (or your teammates)!

This is a never-ending process of improvement. As they say, CI/CD (Continuous Improvement/Delivery) is a lifestyle.

Parting Words

As this archive is only a high-level introduction on what MLOps is, why it’s needed and how to roughly implement it, there is still a lot of great references out there on the interwebs to look at!

As long as you take things step by step, testing what works and what doesn’t, and learning from each stage as you improve your ML infrastructure, you’ll definitely get to a great point where your models are properly served to your users.

I hope this quick introduction has provided great insight, and good luck on your MLOps journey!

(thanks to Jayson Cunanan, Ph. D. and Felix Kirmse for the pre-tech-talk reviews!)

--

--

Sho

ML/full-stack engineer by day, animation director by night