Table of Contents
· Part 1: What is motivation behind MLOps?
· 1.1 ML Project Lifecycle
· 1.2 MLOps
· Part 2: end-to-end machine learning workflow
· 2.1 Data Engineering
· 2.2 Model Engineering
· 2.3 Model Deployment
· Part 3: MLOps Principles and Best Practices
· 3.1 Automation
· 3.2 Versioning
· 3.3 Testing
· 3.4 Tooling
· Reference
Part 1: What is motivation behind MLOps?
1.1 ML Project Lifecycle
Step 1: business understanding: what is the ideal outcome? what is the evaluation metric?
Step 2: prototype ML model: feasibility study, keep the first model simple and get the infrastructure right
Step 3: production: refine ML models, and generate ML models that can be used for production purpose.
MLOps is a concept that is used in production stage.
1.2 MLOps
There’s a fundamental difference between building a ML model in the Jupyter notebook model and deploying an ML model into a production system that generates business value.
According to “2020 State of Enterprise Machine Learning” report, the main challenges people face when developing ML capabilities are scale, version control, model reproducibility, and aligning stakeholders.
The complete development pipeline for a machine learning project includes three levels of change: Data, ML Model, and Code. This means that in machine learning-based systems, the trigger for a build might be the combination of a code change, data change, or model change. This is also known as “Changing Anything Changes Everything” principle.
The term MLOps is defined as “the extension of the DevOps methodology to include Machine Learning and Data Science assets as first-class citizens within the DevOps ecology”.
MLOps is something new, and its evolution has been summarized in the following figure:
Part 2: end-to-end machine learning workflow
Every ML-based software includes three main artifacts: Data, ML Model, and Code. Corresponding to these artifacts, the typical machine learning workflow consists of three main phases:
- Data Engineering: data acquisition & data preparation,
- ML Model Engineering: ML model training & serving, and
- Code Engineering :integrating ML model into the final product.
The Figure below shows the core steps involved in a typical ML workflow.
2.1 Data Engineering
Definition
Data Engineering is an iterative and agile process for exploring, combining, cleaning and transforming raw data into curated datasets for data integration, data science, data discovery and analytics/business intelligence (BI) use cases.
Collecting good data sets has a huge impact on the quality and performance of the ML model. The famous citation
“Garbage In, Garbage Out”,
in the machine learning context means that the ML model is only as good as your data. Therefore, the data, which has been used for training of the ML model, indirectly influence the overall performance of the production system. The amount and quality of the data set are usually problem-specific and can be empirically discovered.
Components
Data engineering includes the following operations:
- Data acquisition: collecting data/labeling data, back up data, privacy compliance, metadata catalog
- Data enrichment: synthetic data/augmented data
- Data exploration and validation: generating data statistics, identifying obvious errors, data visualization
- Data cleaning (wrangling): missing value imputation, fix outliers
- Data splitting: splitting the data into training, validation, and test datasets
2.2 Model Engineering
Definition
The core of the ML workflow is the phase of writing and executing machine learning algorithms to obtain an ML model.
Components
It includes the following procedures:
- Model training: feature engineering, hyper-parameter tuning, Neural Network model architecture
- Model evaluation (test): model comparison, error analysis
- Model packaging: export the trained model into a specific format
Additional Notes on Model Training
Model training can be offline learning or online learning. In the case of offline learning, the model is trained on a set of already collected data. After deploying to the production environment, the ML model remains constant until it re-trained because the model will see a lot of real-live data and becomes stale. This phenomenon is called ‘model decay’ and should be carefully monitored. In the case of online learning, the model is regularly being re-trained as new data arrives, e.g. as data streams. This is usually the case for ML systems that use time-series data, such as sensor, or stock trading data to accommodate the temporal effects in the ML model.
Additional Notes on Model Evaluation
In the model analysis component, we conduct a deep analysis of the training results and ensure that our exported models are performant enough to be pushed to production.
Additional Notes on Model Packaging
Language-agnostic exchange formats include:
- ONNX (Open Neural Network eXchange): ONNX was created to allow any ML tool to share a single model format.
- PFA (Portable Format for Analytics)
- PMML has been standardized by the Data Mining Group (DMG). Basically, .ppml describes a model and pipeline in XML.
2.3 Model Deployment
Definition
Once we trained a machine learning model, we need to deploy it as part of a business application such as a mobile or desktop application, and the final stage of the ML workflow is the integration of the previously engineered ML model into existing software.
Components
Model deployment includes:
- ML model predication/service
- ML model performance monitoring/logging
- Deployment strategies: Docker
Additional Notes on Model Predication
The difference between ML model training and ML model predication can be shown in the following image:
ML model prediction can be batch predictions or real-time predictions.
Batch predictions: The deployed ML model makes a set of predictions based on historical input data. This is often sufficient for data that is not time-dependent, or when it is not critical to obtain real-time predictions as output. Real-time predictions (aka on-demand predictions): Predictions are generated in real-time using the input data that is available at the time of the request.
Additional Notes on Deployment Strategies
As ML model inference being considered stateless, lightweight, and idempotent, containerization becomes the de-facto standard for delivery.
One ubiquitous way is to package the whole ML tech stack (dependencies) and the code for ML model prediction into a Docker container. Then Kubernetes or an alternative (e.g. AWS Fargate) does the orchestration.
Part 3: MLOps Principles and Best Practices
3.1 Automation
The level of automation of the Data, ML Model, and Code pipelines determines the maturity of the ML process.
The following pictures show the CI/CD (Continuous Integration/Continuous Deployment) pipeline automation:
The automation needs the following components:
MLOps is an ML engineering culture that includes the following practices:
- Continuous Integration (CI) extends the testing and validating code and components by adding testing and validating data and models.
- Continuous Delivery (CD) concerns with delivery of an ML training pipeline that automatically deploys another the ML model prediction service.
- Continuous Training (CT) is unique to ML systems property, which automatically retrains ML models for re-deployment.
- Continuous Monitoring (CM) concerns with monitoring production data and model performance metrics, which are bound to business metrics.
A complete end-to-end automated pipeline should look like this:
- We iteratively try out new ML ideas where some of our pipeline components are updated (introducing a new feature for example will see us update the data transform component…). The output of this stage is the source code of the new ML pipeline components that are then pushed to a source repository of the targeted environment.
- The presence of a new source code will trigger the CI/CD pipeline which will in return build the new components and pipeline, run the corresponding unit and integration tests to make sure everything is correctly coded and configured, and finally deploy the new pipeline to target environment if all tests have passed. The unit and integration tests for ML systems deserve an independent article themselves.
- The new deployed pipeline is automatically executed in production based on a schedule, or presence of new training data, or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry and continuously monitored.
3.2 Versioning
Model needs to be versioned, and so are codes and data. In the meantime, we also need to track experiment setup.
The three popular version control tools in ML are: DVC, MLFLOW and Scared, and check Frameworks for Machine Learning Model Management for more details. Here, we list the summaries:
Another popular tool for ML experiments tracking is the Weights and Biases (wandb) library, however, is not listed in the above comparison. Best Tools to Manage Machine Learning Projects is a more comprehensive comparison.
From the user’s perspective, for each trained model, we expect to have definitive information regarding data, its trained model and its deployment scenario. The following table lists what kind information we want to track in order to reach this target.
The role of the pipeline metadata storage is to record all details about our ML pipeline executions. This is very important in order to keep the lineage between components and reproduce deployed models anytime needed.
- The versions of our pipeline and components source codes that were executed.
- The input arguments that were passed to our pipeline.
- The artifacts/outputs produced by each executed component of our pipeline, such as the path to the raw data, transformed datasets, validation statistics and anomalies, trained model…
- The model evaluation metrics, and the model validation decision regarding model deployment, which is produced during the model analysis and validation component…
3.3 Testing
The complete development pipeline includes three essential components, data pipeline, ML model pipeline, and application pipeline. In accordance with this separation we distinguish three scopes for testing in ML systems: tests for features and data, tests for model development, and tests for ML infrastructure.
Data(or Feature) Tests
Data validation: automatic check for data and features schema/domain.
Features importance test to understand whether new features add a predictive power.
Feature creation code, data cleaning codes and etc. should be tested by unit tests
Model Development Tests
Validating performance of a model.
Compare the model with other models.
Model training should be reproducible.
Unit-test for algorithm’s correctness on obvious samples.
Deployment Tests/Monitoring
Check the deployment pipeline is consistent with the training pipeline.
Computational performance checking.
After these tests, we may have a ML Test Score:
3.4 Tooling
MLOps must be language-, framework-, platform-, and infrastructure-agnostic practice. MLOps should follow a “convention over configuration” implementation.
The MLOps technology stack should include tooling for the following tasks:
- data engineering,
- version control of data, ML models and code,
- coninuous integration and continuous delivery pipelines,
- automating deployments and experiments,
- model performance assessment, and
- model monitioring in production.
The Linux Foundation’s LF AI project created a visualization for the ML/AI and MLOps tools. Another curated list of the production machine learning tools is maintained by the Institute for Ethical AI.
Reference
Blogs